refactor: Migrate SSH Key Group API handlers to WithTx#499
refactor: Migrate SSH Key Group API handlers to WithTx#499chet wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Summary by CodeRabbit
WalkthroughReplaces manual database/sql transaction handling with cdb.WithTx across Create, Update, and Delete SSH Key Group handlers; adds advisory locking inside transactions, richer validation and association state updates, status detail rows and version management inside TX, and moves workflow triggers for sync/delete to post-commit execution. ChangesSSH Key Group Handler Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-05-06 21:01:29 UTC | Commit: c403074 |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/pkg/api/handler/sshkeygroup.go (1)
498-501:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove the version check under the advisory lock.
The optimistic concurrency guard is currently evaluated on a pre-transaction snapshot. Two concurrent PATCHes can both pass Line 498, then serialize on the lock and still apply stale writes. Reload the SSH Key Group after Line 543 and compare
Versioninside the locked transaction before mutating anything.Suggested fix
- // Verify version with current one - if *skg.Version != *apiRequest.Version { - return cutil.NewAPIErrorResponse(c, http.StatusForbidden, "Version for SSH Key Group in request does not match with current SSH Key Group. Please fetch latest object before updating.", nil) - } - skgDAO := cdbm.NewSSHKeyGroupDAO(uskgh.dbSession) @@ err = cdb.WithTx(ctx, uskgh.dbSession, func(tx *cdb.Tx) error { // Acquire an advisory lock on the SSH Key Group on which there could be contention // this lock is released when the transaction commits or rollsback derr := tx.TryAcquireAdvisoryLock(ctx, cdb.GetAdvisoryLockIDFromString(skg.ID.String()), nil) if derr != nil { logger.Error().Err(derr).Msg("Failed to acquire advisory lock on SSH Key Group") return cutil.NewAPIError(http.StatusInternalServerError, "Failed to update SSH Key Group, could not acquire DB lock", nil) } + + lockedSKG, derr := common.GetSSHKeyGroupFromIDString(ctx, tx, sshKeyGroupStrID, uskgh.dbSession, nil) + if derr != nil { + logger.Error().Err(derr).Msg("error retrieving SSH Key Group from DB by ID") + return cutil.NewAPIError(http.StatusInternalServerError, "Failed to update SSH Key Group, DB error", nil) + } + if *lockedSKG.Version != *apiRequest.Version { + return cutil.NewAPIError(http.StatusForbidden, "Version for SSH Key Group in request does not match with current SSH Key Group. Please fetch latest object before updating.", nil) + } + skg = lockedSKGAlso applies to: 540-547
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/sshkeygroup.go` around lines 498 - 501, Remove the optimistic version check currently performed on skg before the advisory lock and instead, after you acquire the advisory lock and are inside the transaction (i.e., immediately after the lock acquisition), re-load the SSH Key Group from the DB (using the same loader used earlier that fetched skg) and compare the reloaded entity's Version with apiRequest.Version; if they differ return the same cutil.NewAPIErrorResponse(..., http.StatusForbidden, ...) and abort the update. Apply this same change to the other update path noted (the code region currently around the other version check), so all version checks happen inside the locked transaction against a freshly reloaded SSH Key Group before any mutation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Around line 785-827: The response is built from dbskgsd which is fetched
before the syncRequired branch inserts a new status-detail; after creating the
new SSHKeyGroupStatusSyncing entry (sdDAO.CreateFromParams) you must re-query
status details into dbskgsd (call sdDAO.GetAllByEntityID with the same params
used earlier) so the response includes the newly created status-detail; place
this refresh immediately after the CreateFromParams success path (and keep
existing error handling).
- Around line 574-576: The code only loads existing associations when
apiRequest.SiteIDs or apiRequest.SSHKeyIDs are present, causing partial PATCHes
to miss the other side's current associations; always fetch both sets up front:
call skgsaDAO.GetAll(...) to populate existingSiteAssociationIDMap and
skgkaDAO.GetAll(...) to populate existingKeyAssociationIDMap before any
apiRequest field checks, then use apiRequest.SiteIDs / apiRequest.SSHKeyIDs only
to decide adds/removals and to set Syncing and syncRequired accordingly (leave
the rest of the sync/state logic unchanged).
---
Outside diff comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Around line 498-501: Remove the optimistic version check currently performed
on skg before the advisory lock and instead, after you acquire the advisory lock
and are inside the transaction (i.e., immediately after the lock acquisition),
re-load the SSH Key Group from the DB (using the same loader used earlier that
fetched skg) and compare the reloaded entity's Version with apiRequest.Version;
if they differ return the same cutil.NewAPIErrorResponse(...,
http.StatusForbidden, ...) and abort the update. Apply this same change to the
other update path noted (the code region currently around the other version
check), so all version checks happen inside the locked transaction against a
freshly reloaded SSH Key Group before any mutation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6b34d897-2de1-4c0d-9ab4-b97df6202781
📒 Files selected for processing (1)
api/pkg/api/handler/sshkeygroup.go
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Around line 540-547: The optimistic version check must be repeated inside the
transaction after acquiring the advisory lock to avoid lost updates: after
tx.TryAcquireAdvisoryLock (inside the cdb.WithTx callback) re-read the current
SSH Key Group row/version (use the same store/DB accessor used elsewhere in this
handler) and compare its version to the incoming request's expected version; if
they differ return an API error (e.g. via cutil.NewAPIError) and abort the
transaction so stale updates are rejected. Ensure you reference the same skg.ID
used for the lock and return the same HTTP 409/appropriate error on mismatch.
- Around line 737-749: The loop that flips existingSiteAssociationIDMap entries
to Syncing must skip associations already in the deleting state: inside the for
stID, sga := range existingSiteAssociationIDMap loop (before calling
skgsaDAO.UpdateFromParams), add a guard that if sga.Status ==
cdbm.SSHKeyGroupSiteAssociationStatusDeleting then continue (do not call
UpdateFromParams); this prevents re-activating entries during key-only PATCHes
(when apiRequest.SiteIDs == nil) and ensures only non-deleting associations are
transitioned to cdbm.SSHKeyGroupSiteAssociationStatusSyncing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e6479dc1-e771-488e-9bb6-5b5e31534286
📒 Files selected for processing (1)
api/pkg/api/handler/sshkeygroup.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
api/pkg/api/handler/sshkeygroup.go (2)
662-662: 💤 Low valueOmit unnecessary blank identifier in range expression.
Same as above—idiomatic Go elides the blank identifier.
♻️ Suggested refinement
- for sgaID, _ := range deletingSiteAssociationIDMap { + for sgaID := range deletingSiteAssociationIDMap {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/sshkeygroup.go` at line 662, Replace the range loop that uses an unnecessary blank identifier over deletingSiteAssociationIDMap (currently written as "for sgaID, _ := range deletingSiteAssociationIDMap") with the idiomatic form "for sgaID := range deletingSiteAssociationIDMap" in the SSH key group deletion logic (refer to the loop using deletingSiteAssociationIDMap in sshkeygroup.go) so the blank identifier is removed while preserving sgaID usage.
612-612: 💤 Low valueOmit unnecessary blank identifier in range expression.
The blank identifier
_is superfluous when only the key is required; idiomatic Go usesfor stID := range.♻️ Suggested refinement
- for stID, _ := range newSiteAssociationIDMap { + for stID := range newSiteAssociationIDMap {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/sshkeygroup.go` at line 612, The range loop over newSiteAssociationIDMap uses an unnecessary blank identifier; update the loop in sshkeygroup.go to use the idiomatic form by iterating with "for stID := range newSiteAssociationIDMap" (replace "for stID, _ := range newSiteAssociationIDMap") so only the key is captured; ensure any code inside the loop still references stID as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Around line 1562-1569: The error message inside the transaction advisory lock
failure is incorrect for the Delete flow; update the cutil.NewAPIError call in
the cdb.WithTx block (where tx.TryAcquireAdvisoryLock is called) to reference
"delete" instead of "update" (e.g., "Failed to delete SSH Key Group, could not
acquire data store lock on Group") so the API error and logs accurately reflect
the Delete handler; keep the rest of the logic (logger.Error() and error return)
unchanged.
---
Nitpick comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Line 662: Replace the range loop that uses an unnecessary blank identifier
over deletingSiteAssociationIDMap (currently written as "for sgaID, _ := range
deletingSiteAssociationIDMap") with the idiomatic form "for sgaID := range
deletingSiteAssociationIDMap" in the SSH key group deletion logic (refer to the
loop using deletingSiteAssociationIDMap in sshkeygroup.go) so the blank
identifier is removed while preserving sgaID usage.
- Line 612: The range loop over newSiteAssociationIDMap uses an unnecessary
blank identifier; update the loop in sshkeygroup.go to use the idiomatic form by
iterating with "for stID := range newSiteAssociationIDMap" (replace "for stID, _
:= range newSiteAssociationIDMap") so only the key is captured; ensure any code
inside the loop still references stID as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b99a8fe4-9cb8-4035-b12d-d221c55f60e0
📒 Files selected for processing (1)
api/pkg/api/handler/sshkeygroup.go
Apply the WithTx pattern from NVIDIA#462 to the SSH Key Group API handlers. Takes into consideration a few previous code reviews for integrating these, ensuring: - We split assignment from error condition checking (thanks @thossain-nv). - We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv). - We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/pkg/api/handler/sshkeygroup.go (1)
1560-1560: 💤 Low valueVariable name
skgsasToSyncis misleading in the Delete handler context.This variable holds associations that will trigger
ExecuteDeleteSSHKeyGroupWorkflow(line 1649), not sync workflows. Renaming toskgsasToDeletewould improve clarity and align with the variable naming in the Update handler.♻️ Suggested rename for clarity
- var skgsasToSync []cdbm.SSHKeyGroupSiteAssociation + var skgsasToDelete []cdbm.SSHKeyGroupSiteAssociationThen update references at lines 1591, 1598, 1617, and 1647.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/sshkeygroup.go` at line 1560, Rename the misleading variable skgsasToSync in the Delete handler to skgsasToDelete to reflect that these associations will be deleted (used to trigger ExecuteDeleteSSHKeyGroupWorkflow), and update every reference to that variable in the Delete handler (all places where it’s declared, appended to, iterated over, and passed into ExecuteDeleteSSHKeyGroupWorkflow). Ensure the new name is used consistently and mirrors the naming used in the Update handler to improve clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Around line 854-874: Add a clarifying comment in the block that branches on
newSSHKeyIDMap/deletingKeyAssociationIDMap (the code iterating dbskgsas and
populating skgsasToSync/skgsasToDelete) explaining that when keys are
added/removed (newSSHKeyIDMap or deletingKeyAssociationIDMap non-empty) delete
workflows are intentionally deferred to the inventory reconciliation process;
note that associations with Status == SSHKeyGroupSiteAssociationStatusDeleting
are persisted earlier but skipped here, so immediate skgsasToDelete population
is skipped and cleanup will occur asynchronously via inventory reconciliation
rather than in this commit path.
---
Nitpick comments:
In `@api/pkg/api/handler/sshkeygroup.go`:
- Line 1560: Rename the misleading variable skgsasToSync in the Delete handler
to skgsasToDelete to reflect that these associations will be deleted (used to
trigger ExecuteDeleteSSHKeyGroupWorkflow), and update every reference to that
variable in the Delete handler (all places where it’s declared, appended to,
iterated over, and passed into ExecuteDeleteSSHKeyGroupWorkflow). Ensure the new
name is used consistently and mirrors the naming used in the Update handler to
improve clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 74189f44-aaf1-4a9d-bdfd-57693d0d74bf
📒 Files selected for processing (1)
api/pkg/api/handler/sshkeygroup.go
| // Determine workflow targets to trigger after commit | ||
| // If keys are added or removed, trigger workflow to sync SSH Key Group across all Sites, except for the ones that are deleted | ||
| if len(newSSHKeyIDMap) > 0 || len(deletingKeyAssociationIDMap) > 0 { | ||
| for _, skgsa := range dbskgsas { | ||
| if skgsa.Status == cdbm.SSHKeyGroupSiteAssociationStatusDeleting { | ||
| continue | ||
| } | ||
|
|
||
| skgsasToSync = append(skgsasToSync, skgsa) | ||
| } | ||
| } else { | ||
| for _, skgsa := range dbskgsas { | ||
| if newSiteAssociationIDMap[skgsa.SiteID.String()] { | ||
| skgsasToSync = append(skgsasToSync, skgsa) | ||
| } | ||
| } else { | ||
| for _, skgsa := range dbskgsas { | ||
| if newSiteAssociationIDMap[skgsa.SiteID.String()] { | ||
| skgsasToSync = append(skgsasToSync, skgsa) | ||
| } | ||
|
|
||
| if deletingSiteAssociationIDMap[skgsa.ID] { | ||
| skgsasToDelete = append(skgsasToDelete, skgsa) | ||
| if deletingSiteAssociationIDMap[skgsa.ID] { | ||
| skgsasToDelete = append(skgsasToDelete, skgsa) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if similar handlers trigger delete workflows in mixed-change scenarios
# Expected: Understand the pattern used elsewhere for delete workflow triggering
rg -nP --type=go -C5 'ExecuteDeleteSSHKeyGroupWorkflow|skgsasToDelete' api/pkg/api/handler/Repository: NVIDIA/infra-controller-rest
Length of output: 3536
🏁 Script executed:
# Check the context around line 663 where associations are marked as Deleting
sed -n '650,680p' api/pkg/api/handler/sshkeygroup.goRepository: NVIDIA/infra-controller-rest
Length of output: 1467
🏁 Script executed:
# Search for inventory reconciliation or eventual cleanup mechanism
rg -n 'inventory|reconciliation|eventual' api/pkg/api/handler/sshkeygroup.goRepository: NVIDIA/infra-controller-rest
Length of output: 400
🏁 Script executed:
# Check for any comments or TODOs about delete workflow or deleting associations
rg -n -B2 -A2 'TODO|FIXME|eventually' api/pkg/api/handler/sshkeygroup.goRepository: NVIDIA/infra-controller-rest
Length of output: 54
🏁 Script executed:
# Get broader context of the PATCH handler to understand the full flow
sed -n '500,550p' api/pkg/api/handler/sshkeygroup.goRepository: NVIDIA/infra-controller-rest
Length of output: 2378
Add clarifying comment: Delete workflows deferred to inventory reconciliation when keys change and sites are removed simultaneously.
The current control flow defers delete workflow execution to the inventory reconciliation mechanism when newSSHKeyIDMap or deletingKeyAssociationIDMap is non-empty. Associations marked as SSHKeyGroupSiteAssociationStatusDeleting are explicitly persisted to the database (lines 663–680) but skipped from skgsasToDelete population, preventing delete workflow execution in this code path.
While this behavior appears intentional (as evidenced by "unsynced groups will be triggered by inventory" comments elsewhere in the codebase), the deferral strategy should be explicitly documented at lines 854–874 to clarify that cleanup occurs asynchronously via inventory reconciliation rather than immediately. Add a comment explaining this design decision to prevent future maintainers from misinterpreting the skipped delete workflow execution as a bug.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/sshkeygroup.go` around lines 854 - 874, Add a clarifying
comment in the block that branches on newSSHKeyIDMap/deletingKeyAssociationIDMap
(the code iterating dbskgsas and populating skgsasToSync/skgsasToDelete)
explaining that when keys are added/removed (newSSHKeyIDMap or
deletingKeyAssociationIDMap non-empty) delete workflows are intentionally
deferred to the inventory reconciliation process; note that associations with
Status == SSHKeyGroupSiteAssociationStatusDeleting are persisted earlier but
skipped here, so immediate skgsasToDelete population is skipped and cleanup will
occur asynchronously via inventory reconciliation rather than in this commit
path.
Description
Apply the WithTx pattern from #462 to the SSH Key Group API handlers.
Takes into consideration a few previous code reviews for integrating these, ensuring:
TerminateWorkflowOnTimeOuthelper and not duplicate code (thanks @thossain-nv).WithTxand not a weird mix ofWithTxandWithTxResult.Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes