refactor: Introduce db WithTx + migrate Expected handlers#462
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Summary by CodeRabbit
WalkthroughRefactors several handlers to use new DB transaction helpers ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Handler as HTTP_Handler
participant DB as Database
participant Temporal as Temporal_Workflow
rect rgba(200,200,255,0.5)
Client->>Handler: Create/Update/Delete request
Handler->>DB: cdb.WithTxResult / cdb.WithTx (begin tx, invoke callback)
DB->>DB: BeginTx
DB->>DB: Invoke callback(tx)
DB->>Temporal: common.ExecuteSyncWorkflow (inside callback)
Temporal-->>DB: Workflow result / error
DB-->>DB: Commit (on callback success) or Rollback (on error/panic)
DB-->>Handler: result or error
end
Handler->>Client: HTTP response (via HandleTxError if tx error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db/pkg/db/tx.go`:
- Around line 90-101: WithTxOpts (and similarly WithTxResultOpts) needs to
guarantee rollback if the callback panics after BeginTx succeeds: wrap the
callback invocation with a defer that checks for panic via recover and calls
tx.Rollback() (ignoring its error), then re-panics to propagate the original
panic; also ensure the defer runs before calling fn(tx) so any later returns
still commit/rollback correctly, and keep existing error handling that rolls
back on returned errors and commits on success.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 225f415b-8340-4e09-866f-f3d00c948b7a
📒 Files selected for processing (7)
api/pkg/api/handler/expectedmachine.goapi/pkg/api/handler/expectedpowershelf.goapi/pkg/api/handler/expectedswitch.goapi/pkg/api/handler/util/common/common.gocommon/pkg/util/api.godb/pkg/db/tx.godb/pkg/db/tx_test.go
944ce6f to
0579204
Compare
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-04-30 21:45:32 UTC | Commit: 0579204 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
api/pkg/api/handler/util/common/common.go (1)
609-620: ⚡ Quick winAdd focused coverage for
HandleTxError.This helper now sits on every refactored Expected* mutation path, so it is worth pinning down its two contracts explicitly: preserving wrapped
*cutil.APIErrorvalues and mapping non-API transaction failures to the fallback 500 response. A small Echo-based unit test here would prevent future regressions from silently converting application errors into generic transaction failures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/util/common/common.go` around lines 609 - 620, Add focused unit tests for HandleTxError: write an Echo-based test that exercises two cases—(1) when the passed error wraps a *cutil.APIError (use errors.Wrap or fmt.Errorf("%w", apiErr)) to ensure HandleTxError preserves apiErr.Code, apiErr.Message and apiErr.Data by asserting the HTTP response code and JSON body returned by cutil.NewAPIErrorResponse; and (2) when the error is a non-API error to ensure it logs the original error and returns http.StatusInternalServerError with the provided fallback message and nil data. Use the existing HandleTxError function, a test Echo context/request/recorder, and assertions on response status and JSON payload; optionally mock or capture the zerolog.Logger output to assert logging in the non-API error case.db/pkg/db/tx_test.go (1)
382-524: ⚡ Quick winExercise the
Optsvariants explicitly.This suite proves the default wrappers well, but the newly introduced API surface is four helpers, not two. Please add one test that passes a non-default
sql.TxOptionsintoWithTxOpts/WithTxResultOptsso a regression in option forwarding is caught.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/pkg/db/tx_test.go` around lines 382 - 524, Add a new unit test that explicitly exercises the Opts variants by calling WithTxOpts and WithTxResultOpts with a non-default sql.TxOptions (e.g. sql.TxOptions{Isolation: sql.LevelSerializable, ReadOnly: false}) and asserting normal commit behavior and returned value visibility; locate this alongside the other tests (same suite using testTxGetTestSession, testTxSetupSchema, GetIDB, TestTable) and mirror the existing success assertions from TestWithTx_CommitsOnSuccess and TestWithTxResult_CommitsAndReturnsValue so the options are forwarded and the inserted row is visible after the call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db/pkg/db/tx_test.go`:
- Around line 424-435: Replace the non-fatal assertion on session creation in
TestWithTx_PropagatesBeginTxError with a fatal check and ensure the session is
closed: use require.NoError(t, err) after calling NewSession (so the test stops
on init failure) and defer badSession.Close() (or close it when initialization
succeeds) before calling WithTx; this involves the NewSession symbol, the
TestWithTx_PropagatesBeginTxError test, and the WithTx/Tx usage so the test
fails fast and does not proceed with a nil/bad session.
---
Nitpick comments:
In `@api/pkg/api/handler/util/common/common.go`:
- Around line 609-620: Add focused unit tests for HandleTxError: write an
Echo-based test that exercises two cases—(1) when the passed error wraps a
*cutil.APIError (use errors.Wrap or fmt.Errorf("%w", apiErr)) to ensure
HandleTxError preserves apiErr.Code, apiErr.Message and apiErr.Data by asserting
the HTTP response code and JSON body returned by cutil.NewAPIErrorResponse; and
(2) when the error is a non-API error to ensure it logs the original error and
returns http.StatusInternalServerError with the provided fallback message and
nil data. Use the existing HandleTxError function, a test Echo
context/request/recorder, and assertions on response status and JSON payload;
optionally mock or capture the zerolog.Logger output to assert logging in the
non-API error case.
In `@db/pkg/db/tx_test.go`:
- Around line 382-524: Add a new unit test that explicitly exercises the Opts
variants by calling WithTxOpts and WithTxResultOpts with a non-default
sql.TxOptions (e.g. sql.TxOptions{Isolation: sql.LevelSerializable, ReadOnly:
false}) and asserting normal commit behavior and returned value visibility;
locate this alongside the other tests (same suite using testTxGetTestSession,
testTxSetupSchema, GetIDB, TestTable) and mirror the existing success assertions
from TestWithTx_CommitsOnSuccess and TestWithTxResult_CommitsAndReturnsValue so
the options are forwarded and the inserted row is visible after the call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5e01caf0-3a50-49be-95f2-2417506795e2
📒 Files selected for processing (7)
api/pkg/api/handler/expectedmachine.goapi/pkg/api/handler/expectedpowershelf.goapi/pkg/api/handler/expectedswitch.goapi/pkg/api/handler/util/common/common.gocommon/pkg/util/api.godb/pkg/db/tx.godb/pkg/db/tx_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- common/pkg/util/api.go
- api/pkg/api/handler/expectedswitch.go
0579204 to
8066811
Compare
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/pkg/api/handler/expectedswitch.go`:
- Around line 158-214: The transaction closure passed to cdb.WithTxResult (the
block that calls esDAO.Create and returns es) is performing Temporal work
(calling cesh.scp.GetClientByID and common.ExecuteSyncWorkflow) while the DB
transaction is still open; move all Temporal-related logic out of that closure
so the closure only creates the ExpectedSwitch DB record and returns it.
Specifically: in the function using WithTxResult, keep only the esDAO.Create
call and return es (avoid calling cesh.scp.GetClientByID, building
workflowOptions, or calling common.ExecuteSyncWorkflow inside the closure);
after WithTxResult returns successfully (no tx error), retrieve the site client
via cesh.scp.GetClientByID and call common.ExecuteSyncWorkflow with
createExpectedSwitchRequest and workflowOptions, converting any workflow errors
to the same API errors currently returned. Ensure the workflow ID construction
("expected-switch-create-"+es.ID.String()) and createExpectedSwitchRequest use
the committed es returned from WithTxResult so Temporal execution happens
post-commit (or replace with an outbox mechanism if preferred).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 373ae46c-cb8a-45fe-b210-38da067c66de
📒 Files selected for processing (7)
api/pkg/api/handler/expectedmachine.goapi/pkg/api/handler/expectedpowershelf.goapi/pkg/api/handler/expectedswitch.goapi/pkg/api/handler/util/common/common.gocommon/pkg/util/api.godb/pkg/db/tx.godb/pkg/db/tx_test.go
✅ Files skipped from review due to trivial changes (1)
- api/pkg/api/handler/expectedpowershelf.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/pkg/api/handler/expectedmachine.go
thossain-nv
left a comment
There was a problem hiding this comment.
Looks good, thank you @chet! Added a suggestion for sending friendlier errors to the user.
| func WithTxOpts(ctx context.Context, dbSession *Session, opts *sql.TxOptions, fn func(tx *Tx) error) error { | ||
| tx, err := BeginTx(ctx, dbSession, opts) | ||
| if err != nil { | ||
| return fmt.Errorf("begin tx: %w", err) |
There was a problem hiding this comment.
Should this be a well defined error e.g. ErrTransactionInitiation so we can send users back a nice message in HandleTxError e.g. Failed to complete request, DB transaction initiation error
| return err | ||
| } | ||
| if err := tx.Commit(); err != nil { | ||
| return fmt.Errorf("commit tx: %w", err) |
There was a problem hiding this comment.
Similar to above: ErrTransactionCommit and Failed to complete request, DB transaction commit error
Following on all of the `Expected*` cleanup work I've been doing, I was looking at the database bits, and wanted to attempt to clean things up a bit even more there.
This is modeled on the `with_txn` pattern from the "Core" `api-db` crate, and adds four helpers:
```
cdb.WithTx(ctx, dbSession, func(tx *cdb.Tx) error { ... })
cdb.WithTxOpts(ctx, dbSession, opts, fn) // to provide non-default sql.TxOptions
cdb.WithTxResult(ctx, dbSession, fn) // where the closure returns (T, error)
cdb.WithTxResultOpts(ctx, dbSession, opts, fn)
```
Where if the closure:
- Returns `nil`, the `tx` is committed.
- Returns an `err`, the `tx`is rolled back.
Callers no longer manage `Begin`/`Commit`/`Rollback` or maintain a "did we commit?" flag manually -- the closure boundary IS the commit boundary. The whole class of "*forgot to set txCommitted = true after Commit*" bugs goes away because there's no flag to forget.
For now, I'm only wiring up the `Expected*` handlers to keep the size of this PR small, so it's easier to understand what's going on, what things look like after being refactored, etc.
Assuming we like it, I can do another PR that pulls more handlers in.
Tests added!
Signed-off-by: Chet Nichols III <chetn@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db/pkg/db/tx.go`:
- Around line 121-144: WithTxResultOpts currently returns plain-wrapped errors
for BeginTx and Commit which prevents sentinel checks; change the error returns
so the BeginTx failure uses the ErrTransactionInitiation sentinel (e.g., wrap
the BeginTx error with fmt.Errorf("%w: %v", ErrTransactionInitiation, err) or
errors.Join/Wrap as your project prefers) and the Commit failure uses
ErrTransactionCommit (wrap the tx.Commit() error with ErrTransactionCommit
similarly); update the return paths in WithTxResultOpts that reference BeginTx
and tx.Commit to return those wrapped sentinel errors so errors.Is(err,
ErrTransactionInitiation/ErrTransactionCommit) works as intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 84220d40-bc93-4165-b8d0-4e25e39a9ee2
📒 Files selected for processing (8)
api/pkg/api/handler/expectedmachine.goapi/pkg/api/handler/expectedpowershelf.goapi/pkg/api/handler/expectedswitch.goapi/pkg/api/handler/util/common/common.gocommon/pkg/util/api.godb/pkg/db/error.godb/pkg/db/tx.godb/pkg/db/tx_test.go
✅ Files skipped from review due to trivial changes (1)
- db/pkg/db/error.go
🚧 Files skipped from review as they are similar to previous changes (2)
- common/pkg/util/api.go
- db/pkg/db/tx_test.go
| // WithTxResultOpts is the value-returning variant of WithTxOpts. | ||
| func WithTxResultOpts[T any](ctx context.Context, dbSession *Session, opts *sql.TxOptions, fn func(tx *Tx) (T, error)) (T, error) { | ||
| var zero T | ||
| tx, err := BeginTx(ctx, dbSession, opts) | ||
| if err != nil { | ||
| return zero, fmt.Errorf("begin tx: %w", err) | ||
| } | ||
| // If fn panics, ensure the tx is rolled back so we don't leak an open | ||
| // transaction (and any locks it holds) until the connection drops. | ||
| // The original panic is re-raised after rollback. | ||
| defer func() { | ||
| if p := recover(); p != nil { | ||
| _ = tx.Rollback() | ||
| panic(p) | ||
| } | ||
| }() | ||
| result, err := fn(tx) | ||
| if err != nil { | ||
| _ = tx.Rollback() | ||
| return zero, err | ||
| } | ||
| if err := tx.Commit(); err != nil { | ||
| return zero, fmt.Errorf("commit tx: %w", err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect WithTxResultOpts implementation:"
sed -n '121,146p' db/pkg/db/tx.go
echo
echo "Compare sentinel usage between WithTxOpts and WithTxResultOpts:"
rg -n 'WithTxOpts|WithTxResultOpts|ErrTransactionInitiation|ErrTransactionCommit|begin tx:|commit tx:' db/pkg/db/tx.go -C 2
echo
echo "Check tx helper tests for errors.Is coverage on WithTxResult paths:"
rg -n 'WithTxResult|ErrTransactionInitiation|ErrTransactionCommit|errors\.Is' db/pkg/db/tx_test.go -C 2 || trueRepository: NVIDIA/infra-controller-rest
Length of output: 4150
🏁 Script executed:
# Search for HandleTxError and its error.Is checks
rg -n 'HandleTxError|errors\.Is.*ErrTransaction' --type go -C 3
# Check for WithTxResult usage in handlers
rg -n 'WithTxResult|WithTxResultOpts' --type go -C 1
# Verify SPDX header in tx.go
head -20 db/pkg/db/tx.go | grep -i 'spdx\|copyright'Repository: NVIDIA/infra-controller-rest
Length of output: 13560
🏁 Script executed:
# Verify the error wrapping semantics - show exact formats used
rg -n 'fmt\.Errorf.*ErrTransaction' db/pkg/db/tx.go -C 1
rg -n 'fmt\.Errorf.*begin tx|commit tx' db/pkg/db/tx.go -C 1Repository: NVIDIA/infra-controller-rest
Length of output: 456
Wrap WithTxResultOpts error returns with transaction sentinels to enable phase-specific error handling
Lines 126 and 143 wrap errors with plain text messages ("begin tx" and "commit tx") instead of the sentinel values ErrTransactionInitiation and ErrTransactionCommit. This breaks the contract documented in db/pkg/db/error.go, which explicitly states that both WithTx*/WithTxResult* functions should return these sentinels. Consequently, errors.Is(err, cdb.ErrTransactionInitiation/Commit) checks in common.HandleTxError() will fail for all WithTxResult call sites, causing transaction failures to fall through to generic fallback messages rather than phase-specific error responses. This affects multiple production handlers that depend on this error routing.
Proposed fix
func WithTxResultOpts[T any](ctx context.Context, dbSession *Session, opts *sql.TxOptions, fn func(tx *Tx) (T, error)) (T, error) {
var zero T
tx, err := BeginTx(ctx, dbSession, opts)
if err != nil {
- return zero, fmt.Errorf("begin tx: %w", err)
+ return zero, fmt.Errorf("%w: %w", ErrTransactionInitiation, err)
}
// If fn panics, ensure the tx is rolled back so we don't leak an open
// transaction (and any locks it holds) until the connection drops.
// The original panic is re-raised after rollback.
defer func() {
if p := recover(); p != nil {
_ = tx.Rollback()
panic(p)
}
}()
result, err := fn(tx)
if err != nil {
_ = tx.Rollback()
return zero, err
}
if err := tx.Commit(); err != nil {
- return zero, fmt.Errorf("commit tx: %w", err)
+ return zero, fmt.Errorf("%w: %w", ErrTransactionCommit, err)
}
return result, nil
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@db/pkg/db/tx.go` around lines 121 - 144, WithTxResultOpts currently returns
plain-wrapped errors for BeginTx and Commit which prevents sentinel checks;
change the error returns so the BeginTx failure uses the
ErrTransactionInitiation sentinel (e.g., wrap the BeginTx error with
fmt.Errorf("%w: %v", ErrTransactionInitiation, err) or errors.Join/Wrap as your
project prefers) and the Commit failure uses ErrTransactionCommit (wrap the
tx.Commit() error with ErrTransactionCommit similarly); update the return paths
in WithTxResultOpts that reference BeginTx and tx.Commit to return those wrapped
sentinel errors so errors.Is(err, ErrTransactionInitiation/ErrTransactionCommit)
works as intended.
Apply the new `WithTx` pattern from NVIDIA#462 to the `operatingsystem` handlers. Each handler now performs database writes and synchronous `Site` workflow execution inside a `cdb.WithTx` closure. On workflow timeout, the `*stc.TerminateWorkflow*` call is captured into a `timeoutResp` closure, so the transaction can roll back first (before the workflow is terminated), and a 500 surfaced via `NewAPIErrorResponse`. `BeginTx` + `Commit` failures flow through `HandleTxError` automatically. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This applies the new `WithTx` pattern from NVIDIA#462 to the `vpc` handlers (Create/Update/Delete plus the virtualization-type update). The `UpdateVirtualization` handler uses the `timeoutResp` closure pattern for `common.TerminateWorkflowOnTimeOut` so the helper still runs after the tx unwinds; the other three retain their inline `stc.TerminateWorkflow` paths Keeping these PRs smaller and tightly scoped so they're: - In theory a little easier to read. - More tightly scoped/less "blast radius" per PR. - A little nicer on/for @coderabbitai. 😆 I do I wish the diffs were easier to read, but it is what it is! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the `WithTx` pattern from NVIDIA#462 to the Create/Update/Delete `sshkey` handlers. The post-commit `SyncSSHKeyGroup` workflow triggers stay *outside* the closure (so their log-and-continue error handling doesn't roll back the tx). Keeping these PRs smaller and tightly scoped so they're: - In theory a little easier to read. - More tightly scoped/less "blast radius" per PR. - A little nicer on/for @coderabbitai. 😆 I do I wish the diffs were easier to read, but it is what it is! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the new `WithTx` pattern from NVIDIA#462 to the `operatingsystem` handlers. Each handler now performs database writes and synchronous `Site` workflow execution inside a `cdb.WithTx` closure. On workflow timeout, the `*stc.TerminateWorkflow*` call is captured into a `timeoutResp` closure, so the transaction can roll back first (before the workflow is terminated), and a 500 surfaced via `NewAPIErrorResponse`. `BeginTx` + `Commit` failures flow through `HandleTxError` automatically. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the `WithTx` pattern from NVIDIA#462 to the Create/Update/Delete `sshkey` handlers. The post-commit `SyncSSHKeyGroup` workflow triggers stay *outside* the closure (so their log-and-continue error handling doesn't roll back the tx). Keeping these PRs smaller and tightly scoped so they're: - In theory a little easier to read. - More tightly scoped/less "blast radius" per PR. - A little nicer on/for @coderabbitai. 😆 I do I wish the diffs were easier to read, but it is what it is! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the new `WithTx` pattern from NVIDIA#462 to the `operatingsystem` handlers. Each handler now performs database writes and synchronous `Site` workflow execution inside a `cdb.WithTx` closure. On workflow timeout, the `*stc.TerminateWorkflow*` call is captured into a `timeoutResp` closure, so the transaction can roll back first (before the workflow is terminated), and a 500 surfaced via `NewAPIErrorResponse`. `BeginTx` + `Commit` failures flow through `HandleTxError` automatically. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four `BeginTx` sites in `machine`. @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four `BeginTx` sites in `machine`. @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (from NVIDIA#462) to the Create/Update/Delete `allocation` handlers. Used `WithTx` across the board (and not `WithTxResult`) since each handler needs multiple values out, which is the case in other handlers too, as well as including the `TerminateWorkflowOnTimeOut` helper @coderabbitai feedback addressed in advance (I just ran `coderabbit` CLI locally to save back/forth): - Log the correct error variable on the `GetAllAllocationConstraintsForInstanceType` failure path. - Defensive `nil` check on `ac.DerivedResourceID` before dereferencing it. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the new `WithTx` pattern from NVIDIA#462 to the `operatingsystem` handlers. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (and `WithTxResult`!) from NVIDIA#462 to the `Create`/`Update`/`Delete` NSG handlers. Implements our "`timeoutResp` pattern" (which is something we had introduced in NVIDIA#472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls `common.TerminateWorkflowOnTimeOut` on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network). The adjustment (which we've done before, but figured I'd call it out more explicitly here) is effectively: ``` var timeoutResp func() error err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error { ... if /* workflow timeout detected */ { // capture the terminate work, but DON'T do it yet timeoutResp = func() error { return common.TerminateWorkflowOnTimeOut(...) } return cutil.NewAPIError(...) // forces rollback } ... }) // rollback has now completed, now we do potentially blocking network work if timeoutResp != nil { return timeoutResp() } ``` Also addressed some @coderabbitai feedback around log messages in advance. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four `BeginTx` sites in `machine`, including using the `TerminateWorkflowOnTimeOut` helper. @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four API handlers in `machine`, including using the `TerminateWorkflowOnTimeOut` helper. This also introduces a `MachineStatusMissing` enum to improve how we handle errors for missing machines (which came as a recommendation from @coderabbitai). @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (from NVIDIA#462) to the Create/Update/Delete `allocation` handlers. Used `WithTx` across the board (and not `WithTxResult`) since each handler needs multiple values out, which is the case in other handlers too, as well as including the `TerminateWorkflowOnTimeOut` helper @coderabbitai feedback addressed in advance (I just ran `coderabbit` CLI locally to save back/forth): - Log the correct error variable on the `GetAllAllocationConstraintsForInstanceType` failure path. - Defensive `nil` check on `ac.DerivedResourceID` before dereferencing it. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness. CodeRabbit feedback addressed: - Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
…lper Apply `WithTx` from NVIDIA#462 to the Instance Type handlers, ensuring the splitting of assign/condition are maintained, and `TerminateWorkflowOnTimeOut` is implemented where needed. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four API handlers in `machine`, including using the `TerminateWorkflowOnTimeOut` helper. This also introduces a `MachineStatusMissing` enum to improve how we handle errors for missing machines (which came as a recommendation from @coderabbitai). @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
…lper Apply `WithTx` from NVIDIA#462 to the Instance Type handlers, ensuring the splitting of assign/condition are maintained, and `TerminateWorkflowOnTimeOut` is implemented where needed. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness. CodeRabbit feedback addressed: - Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
…lper Apply `WithTx` from NVIDIA#462 to the Instance Type handlers, ensuring the splitting of assign/condition are maintained, and `TerminateWorkflowOnTimeOut` is implemented where needed. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness. CodeRabbit feedback addressed: - Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness. CodeRabbit feedback addressed: - Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four API handlers in `machine`, including using the `TerminateWorkflowOnTimeOut` helper. This also introduces a `MachineStatusMissing` enum to improve how we handle errors for missing machines (which came as a recommendation from @coderabbitai). @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply `WithTx` (from NVIDIA#462) to all four API handlers in `machine`, including using the `TerminateWorkflowOnTimeOut` helper. This also introduces a `MachineStatusMissing` enum to improve how we handle errors for missing machines (which came as a recommendation from @coderabbitai). @coderabbitai also gave some feedback, which has also been addressed, where this: - Makes sure to use the contextual `logger` instead of the global `log` in the `maintenance-mode` workflow error path. - Passes `tx` to `common.GetInfrastructureProviderForOrg` so the read participates in the same transaction as the surrounding writes. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the new `WithTx` pattern from NVIDIA#462 to the `operatingsystem` handlers, using `TerminateWorkflowOnTimeOut` in a number of new sites. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` from NVIDIA#462 to the VPC Prefix API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness. CodeRabbit feedback addressed: - Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. This also: - Moves Create reads out of WithTx (to happen before) after a callout from @thossain-nv. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. This also: - Moves Create reads out of WithTx (to happen before) after a callout from @thossain-nv. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. This also moves Create and Update reads out of WithTx (to happen before) after a callout from @thossain-nv. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. This also moves: - Create reads out of WithTx (to happen before) after a callout from @thossain-nv. - *Some* Update reads out of WithTx (the advisory lock, version check, and assoication reads need to stay within the txn). Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. This also moves: - Create reads out of WithTx (to happen before) after a callout from @thossain-nv. - *Some* Update reads out of WithTx (the advisory lock, version check, and assoication reads need to stay within the txn). Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Description
Following on all of the
Expected*cleanup work I've been doing, I was looking at the database bits, and wanted to attempt to clean things up a bit even more there.This is modeled on the
with_txnpattern from the "Core"api-dbcrate, and adds four helpers:Where if the closure:
nil, thetxis committed.err, thetxis rolled back.Callers no longer manage
Begin/Commit/Rollbackor maintain a "did we commit?" flag manually -- the closure boundary IS the commit boundary. The whole class of "forgot to set txCommitted = true after Commit()" bugs goes away because there's no flag to forget.Example usage:
For now, I'm only wiring up the
Expected*handlers to keep the size of this PR small, so it's easier to understand what's going on, what things look like after being refactored, etc.Assuming we like it, I can do another PR that pulls more handlers in.
Tests added!
Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes