feat: Add machine identity REST support#481
Conversation
Summary by CodeRabbit
WalkthroughThis PR implements machine-identity support: REST endpoints (tenant-admin CRUD for identity config and token delegation), public .well-known JWKS/OpenID discovery, deterministic workflow IDs and PUT status logic, Temporal workflows and site activities, site-agent manager wiring, mock NICo test infra, CLI/TUI commands, and comprehensive tests. ChangesMachine Identity Feature Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
8a1bdf2 to
e113d08
Compare
…oken delegation)
Adds a REST surface that proxies carbide-core Forge so Tenant Admins can
manage per-tenant JWT-SVID signing configuration and RFC 8693 token
delegation without direct site access. Runtime workloads still obtain
tokens from the DPU IMDS over mTLS — these endpoints are not on the hot
path for token issuance.
Under `/v2/org/{org}/carbide/site/{siteID}`:
- Authenticated (`FORGE_TENANT_ADMIN`, org from URL):
`PUT|GET|DELETE /identity/config`,
`PUT|GET|DELETE /identity/token-delegation`.
- Public (no auth, mounted before the versioned auth middleware so
verifiers without credentials can fetch metadata):
`GET /.well-known/jwks.json`,
`GET /.well-known/openid-configuration`,
`GET /.well-known/spiffe/jwks.json`.
The site controller is authoritative for config presence and issuance
policy; REST proxies via Temporal workflows on the site task queue.
Token-delegation responses expose only a SHA-256 hash of the client
secret — the raw secret is never returned after creation.
Signed-off-by: Parham Armani <parmani@nvidia.com>
|
/ok to test 9e428ea |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-05-07 23:14:48 UTC | Commit: 9e428ea |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/identity.go`:
- Around line 50-58: The payloadHash function currently truncates the SHA-256 to
8 bytes causing high collision risk; update payloadHash (the function that calls
proto.MarshalOptions{Deterministic: true}.Marshal and computes sha256.Sum256) to
return a longer digest — use at least 16 bytes (sum[:16]) or the full sum before
hex-encoding (hex.EncodeToString) so workflow IDs derived from payloads have
≥128 bits of entropy and avoid collisions when
WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING is used.
- Around line 771-778: The current check only ensures the presence of probe.Keys
but not that it is a JSON array; update the unmarshalling so that the local
probe variable validates that "keys" is an array (e.g., change probe.Keys to a
slice type like []json.RawMessage or unmarshal probe.Keys into
[]json.RawMessage) and return the same BadGateway error when the unmarshal into
an array fails or the array is empty; keep the same logging path
(logger.Error().Err(err).Str("orgName", org).Msg(...)) and error response via
cutil.NewAPIErrorResponse so malformed non-array JWKS payloads are rejected
before being forwarded.
In `@cli/tui/commands.go`:
- Line 2628: The echoed LogCmd calls currently print an incorrect command
sequence (missing the "config" and "well-known" components and with arguments
out of order), so update each LogCmd invocation that echoes machine-identity
commands (e.g., the call LogCmd(s, "machine-identity", "get", site.ID)) to
reflect the real CLI order and names — e.g., call LogCmd(s, "config",
"well-known", "machine-identity", "get", site.ID) (and similarly for other
actions like list/rotate) for every occurrence referenced (the calls at the
other sites noted in the comment), ensuring the strings and argument order match
the actual invokable command hierarchy.
- Around line 2683-2689: The code currently parses ttlStr into u64 and accepts
any uint32; enforce the documented bounds 300–86400 client-side by validating
the parsed value before setting body["tokenTtlSec"]. After
strconv.ParseUint(ttlStr, 10, 32) in the same block, check that u64 >= 300 &&
u64 <= 86400 and return a descriptive fmt.Errorf (e.g., "token TTL must be
between 300 and 86400 seconds") if it falls outside that range; only then set
body["tokenTtlSec"] = uint32(u64).
In `@openapi/spec.yaml`:
- Around line 12604-12611: The SPIFFE JWKS response documentation is missing a
Cache-Control response header like the OIDC JWKS endpoint; update the responses
for the SPIFFE JWKS path (the 200 response that references the
components/schemas/JWKS) to include a Cache-Control header entry (same semantics
as the /.well-known/jwks.json documentation) so clients receive explicit caching
guidance; modify the OpenAPI operation that returns the SPIFFE JWKS to add a
headers: Cache-Control with description and schema type string matching the
existing OIDC JWKS header documentation.
- Around line 21517-21520: The response schema IdentityConfig.tokenTtlSec
currently hardcodes minimum: 300 and maximum: 86400 which conflicts with
IdentityConfigUpdateRequest.tokenTtlSec (minimum: 1) — remove the fixed
minimum/maximum from the IdentityConfig.tokenTtlSec schema so the response does
not advertise server-enforced bounds, or alternatively, if those bounds are true
system-wide invariants, update IdentityConfigUpdateRequest.tokenTtlSec to match
and add a short description documenting the rationale; locate and edit the
tokenTtlSec entry in the IdentityConfig and IdentityConfigUpdateRequest schemas
to keep them consistent.
In `@site-workflow/pkg/activity/identity_test.go`:
- Around line 141-168: The test should assert the exact redaction and hashing
contract: after calling m.SetTokenDelegationOnSite verify
resp.GetClientSecretBasic() is non-nil, assert that basic.GetClientSecret() (the
raw field) is empty/nil/zero, and assert that basic.GetClientSecretHash() equals
the expected SHA-256 hex of "super-secret" (compute SHA-256 and compare to
basic.GetClientSecretHash()), while still checking it does not contain the raw
secret; this locks down the use of SHA-256 and prevents regressions in
SetTokenDelegationOnSite, resp.GetClientSecretBasic(), basic.GetClientSecret()
and basic.GetClientSecretHash().
In `@site-workflow/pkg/grpc/server/nico_test_server.go`:
- Around line 1530-1539: The mock server path handling in the branch where
in.GetClientSecretBasic() != nil validates client_id but fails to reject an
empty client_secret; add a validation that checks
strings.TrimSpace(basic.GetClientSecret()) and if empty return
status.Errorf(codes.InvalidArgument, "client_secret is required for
client_secret_basic"); keep the existing response construction
(cwssaws.TokenDelegationResponse_ClientSecretBasic and
cwssaws.ClientSecretBasicResponse) and continue to use
clientSecretDisplayHash(basic.GetClientSecret()) only after the new non-empty
check passes.
- Around line 1423-1473: SetIdentityConfiguration currently accepts empty issuer
or zero TokenTtlSec and persists the issuer template verbatim; add validation
and normalization before persisting: reject empty issuer and TokenTtlSec <= 0
(return InvalidArgument), and normalize the issuer template (e.g., via a new
helper normalizeIssuerTemplate or validateIssuerTemplate) to ensure it contains
the required "{org}" placeholder and does not leak a literal "{org}" into stored
Issuer; set resp.Issuer to the normalized value (instead of in.GetIssuer()) and
keep the rest of the flow (calls to normalizeAllowedAudiences,
generateES256KeyMaterial, and updates to f.identityConfigs/f.identityKeys)
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b2e282d8-4af0-4f01-9c76-df52a7d3334d
📒 Files selected for processing (28)
api/internal/server/server.goapi/pkg/api/handler/identity.goapi/pkg/api/handler/identity_test.goapi/pkg/api/handler/util/common/common.goapi/pkg/api/model/identity.goapi/pkg/api/model/identity_test.goapi/pkg/api/routes.goapi/pkg/api/routes_test.gocli/pkg/commands_test.gocli/pkg/spec_test.gocli/tui/commands.gocli/tui/repl.godocs/index.htmlopenapi/spec.yamlsite-agent/pkg/components/managers/identity/access.gosite-agent/pkg/components/managers/identity/init.gosite-agent/pkg/components/managers/identity/subscriber.gosite-agent/pkg/components/managers/manager.gosite-agent/pkg/components/managers/manageraccess.gosite-agent/pkg/components/managers/managerapi/machineidentity_api.gosite-agent/pkg/components/managers/managerapi/managerapi.gosite-agent/pkg/components/managers/workflow/orchestrator.gosite-agent/pkg/datatypes/managertypes/workflow/workflowtypes.gosite-workflow/pkg/activity/identity.gosite-workflow/pkg/activity/identity_test.gosite-workflow/pkg/grpc/client/testing.gosite-workflow/pkg/grpc/server/nico_test_server.gosite-workflow/pkg/workflow/identity.go
| // payloadHash returns a deterministic short hex digest of the proto message. | ||
| func payloadHash(m proto.Message) (string, error) { | ||
| b, err := proto.MarshalOptions{Deterministic: true}.Marshal(m) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| sum := sha256.Sum256(b) | ||
| return hex.EncodeToString(sum[:8]), nil | ||
| } |
There was a problem hiding this comment.
Use a longer digest for payload-derived workflow IDs.
Only 8 bytes of the SHA-256 are kept here. Because the PUT handlers use WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, a collision aliases two different request bodies to the same workflow execution and can return the wrong result. Keep at least 128 bits, or just use the full digest.
Suggested fix
func payloadHash(m proto.Message) (string, error) {
b, err := proto.MarshalOptions{Deterministic: true}.Marshal(m)
if err != nil {
return "", err
}
sum := sha256.Sum256(b)
- return hex.EncodeToString(sum[:8]), nil
+ return hex.EncodeToString(sum[:16]), nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // payloadHash returns a deterministic short hex digest of the proto message. | |
| func payloadHash(m proto.Message) (string, error) { | |
| b, err := proto.MarshalOptions{Deterministic: true}.Marshal(m) | |
| if err != nil { | |
| return "", err | |
| } | |
| sum := sha256.Sum256(b) | |
| return hex.EncodeToString(sum[:8]), nil | |
| } | |
| // payloadHash returns a deterministic short hex digest of the proto message. | |
| func payloadHash(m proto.Message) (string, error) { | |
| b, err := proto.MarshalOptions{Deterministic: true}.Marshal(m) | |
| if err != nil { | |
| return "", err | |
| } | |
| sum := sha256.Sum256(b) | |
| return hex.EncodeToString(sum[:16]), nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/identity.go` around lines 50 - 58, The payloadHash
function currently truncates the SHA-256 to 8 bytes causing high collision risk;
update payloadHash (the function that calls proto.MarshalOptions{Deterministic:
true}.Marshal and computes sha256.Sum256) to return a longer digest — use at
least 16 bytes (sum[:16]) or the full sum before hex-encoding
(hex.EncodeToString) so workflow IDs derived from payloads have ≥128 bits of
entropy and avoid collisions when WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING is
used.
| if err != nil { | ||
| return err | ||
| } | ||
| LogCmd(s, "machine-identity", "get", site.ID) |
There was a problem hiding this comment.
Fix machine-identity LogCmd output to match real command names.
Several echoed commands are not invokable as printed (missing config or well-known, and order mismatch), which hurts copy/paste usability.
Suggested fix
- LogCmd(s, "machine-identity", "get", site.ID)
+ LogCmd(s, "machine-identity", "config", "get", site.ID)
- LogCmd(s, "machine-identity", "update", site.ID)
+ LogCmd(s, "machine-identity", "config", "update", site.ID)
- LogCmd(s, "machine-identity", "delete", site.ID)
+ LogCmd(s, "machine-identity", "config", "delete", site.ID)
- LogCmd(s, "machine-identity", "jwks", "get", site.ID)
+ LogCmd(s, "machine-identity", "well-known", "jwks", site.ID)
- LogCmd(s, "machine-identity", "spiffe-jwks", "get", site.ID)
+ LogCmd(s, "machine-identity", "well-known", "spiffe-jwks", site.ID)
- LogCmd(s, "machine-identity", "openid-configuration", "get", site.ID)
+ LogCmd(s, "machine-identity", "well-known", "openid", site.ID)Also applies to: 2701-2701, 2724-2724, 2826-2826, 2840-2840, 2854-2854
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/tui/commands.go` at line 2628, The echoed LogCmd calls currently print an
incorrect command sequence (missing the "config" and "well-known" components and
with arguments out of order), so update each LogCmd invocation that echoes
machine-identity commands (e.g., the call LogCmd(s, "machine-identity", "get",
site.ID)) to reflect the real CLI order and names — e.g., call LogCmd(s,
"config", "well-known", "machine-identity", "get", site.ID) (and similarly for
other actions like list/rotate) for every occurrence referenced (the calls at
the other sites noted in the comment), ensuring the strings and argument order
match the actual invokable command hierarchy.
| if strings.TrimSpace(ttlStr) != "" { | ||
| u64, perr := strconv.ParseUint(strings.TrimSpace(ttlStr), 10, 32) | ||
| if perr != nil { | ||
| return fmt.Errorf("invalid token TTL: %w", perr) | ||
| } | ||
| body["tokenTtlSec"] = uint32(u64) | ||
| } |
There was a problem hiding this comment.
Enforce the documented tokenTtlSec bounds client-side.
The prompt states 300-86400, but current parsing accepts any uint32, pushing avoidable validation failures to the server.
Suggested fix
if strings.TrimSpace(ttlStr) != "" {
u64, perr := strconv.ParseUint(strings.TrimSpace(ttlStr), 10, 32)
if perr != nil {
return fmt.Errorf("invalid token TTL: %w", perr)
}
+ if u64 < 300 || u64 > 86400 {
+ return fmt.Errorf("token TTL must be between 300 and 86400 seconds")
+ }
body["tokenTtlSec"] = uint32(u64)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if strings.TrimSpace(ttlStr) != "" { | |
| u64, perr := strconv.ParseUint(strings.TrimSpace(ttlStr), 10, 32) | |
| if perr != nil { | |
| return fmt.Errorf("invalid token TTL: %w", perr) | |
| } | |
| body["tokenTtlSec"] = uint32(u64) | |
| } | |
| if strings.TrimSpace(ttlStr) != "" { | |
| u64, perr := strconv.ParseUint(strings.TrimSpace(ttlStr), 10, 32) | |
| if perr != nil { | |
| return fmt.Errorf("invalid token TTL: %w", perr) | |
| } | |
| if u64 < 300 || u64 > 86400 { | |
| return fmt.Errorf("token TTL must be between 300 and 86400 seconds") | |
| } | |
| body["tokenTtlSec"] = uint32(u64) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/tui/commands.go` around lines 2683 - 2689, The code currently parses
ttlStr into u64 and accepts any uint32; enforce the documented bounds 300–86400
client-side by validating the parsed value before setting body["tokenTtlSec"].
After strconv.ParseUint(ttlStr, 10, 32) in the same block, check that u64 >= 300
&& u64 <= 86400 and return a descriptive fmt.Errorf (e.g., "token TTL must be
between 300 and 86400 seconds") if it falls outside that range; only then set
body["tokenTtlSec"] = uint32(u64).
| t.Run("success with client_secret_basic (hash returned, raw never)", func(t *testing.T) { | ||
| req := &cwssaws.TokenDelegationRequest{ | ||
| OrganizationId: "acme-corp", | ||
| Config: &cwssaws.TokenDelegation{ | ||
| TokenEndpoint: "https://auth.acme.com/oauth2/token", | ||
| SubjectTokenAudience: "acme-exchange", | ||
| AuthMethodConfig: &cwssaws.TokenDelegation_ClientSecretBasic{ | ||
| ClientSecretBasic: &cwssaws.ClientSecretBasic{ | ||
| ClientId: "client-123", | ||
| ClientSecret: "super-secret", | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
| resp, err := m.SetTokenDelegationOnSite(ctx, req) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
| assert.Equal(t, "acme-corp", resp.GetOrganizationId()) | ||
| assert.Equal(t, "https://auth.acme.com/oauth2/token", resp.GetTokenEndpoint()) | ||
| basic := resp.GetClientSecretBasic() | ||
| require.NotNil(t, basic, "response oneof should carry hashed client_secret") | ||
| assert.Equal(t, "client-123", basic.GetClientId()) | ||
| assert.NotEmpty(t, basic.GetClientSecretHash()) | ||
|
|
||
| // Critical security invariant: the raw secret must never appear anywhere | ||
| // in the response proto. | ||
| assert.NotContains(t, basic.GetClientSecretHash(), "super-secret") | ||
| }) |
There was a problem hiding this comment.
Assert the exact secret-redaction contract.
This subtest only proves that the returned value is non-empty and does not literally contain the raw secret. A regression to a different hash algorithm, or a response that still populates client_secret, would still pass. Since this API promises “secret never returned” and a SHA-256 hash specifically, lock that down here with exact assertions.
Proposed test hardening
import (
"context"
+ "crypto/sha256"
+ "encoding/hex"
"testing"
@@
basic := resp.GetClientSecretBasic()
require.NotNil(t, basic, "response oneof should carry hashed client_secret")
assert.Equal(t, "client-123", basic.GetClientId())
- assert.NotEmpty(t, basic.GetClientSecretHash())
+ assert.Empty(t, basic.GetClientSecret())
+ expected := sha256.Sum256([]byte("super-secret"))
+ assert.Equal(t, hex.EncodeToString(expected[:]), basic.GetClientSecretHash())
// Critical security invariant: the raw secret must never appear anywhere
// in the response proto.
assert.NotContains(t, basic.GetClientSecretHash(), "super-secret")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@site-workflow/pkg/activity/identity_test.go` around lines 141 - 168, The test
should assert the exact redaction and hashing contract: after calling
m.SetTokenDelegationOnSite verify resp.GetClientSecretBasic() is non-nil, assert
that basic.GetClientSecret() (the raw field) is empty/nil/zero, and assert that
basic.GetClientSecretHash() equals the expected SHA-256 hex of "super-secret"
(compute SHA-256 and compare to basic.GetClientSecretHash()), while still
checking it does not contain the raw secret; this locks down the use of SHA-256
and prevents regressions in SetTokenDelegationOnSite,
resp.GetClientSecretBasic(), basic.GetClientSecret() and
basic.GetClientSecretHash().
| func (f *NICoServerImpl) SetIdentityConfiguration(ctx context.Context, req *cwssaws.IdentityConfigRequest) (*cwssaws.IdentityConfigResponse, error) { | ||
| if req == nil || req.GetOrganizationId() == "" || req.GetConfig() == nil { | ||
| return nil, status.Errorf(codes.InvalidArgument, "Invalid request argument") | ||
| } | ||
| in := req.GetConfig() | ||
| if strings.TrimSpace(in.GetDefaultAudience()) == "" { | ||
| return nil, status.Errorf(codes.InvalidArgument, "default_audience is required") | ||
| } | ||
| allowed, err := normalizeAllowedAudiences(in.GetDefaultAudience(), in.GetAllowedAudiences()) | ||
| if err != nil { | ||
| return nil, status.Errorf(codes.InvalidArgument, "%s", err.Error()) | ||
| } | ||
|
|
||
| orgID := req.GetOrganizationId() | ||
| now := timestamppb.Now() | ||
|
|
||
| existing, isUpdate := f.identityConfigs[orgID] | ||
| resp := &cwssaws.IdentityConfigResponse{ | ||
| OrganizationId: orgID, | ||
| Config: &cwssaws.IdentityConfig{ | ||
| Enabled: in.GetEnabled(), | ||
| Issuer: in.GetIssuer(), | ||
| DefaultAudience: in.GetDefaultAudience(), | ||
| AllowedAudiences: allowed, | ||
| TokenTtlSec: in.GetTokenTtlSec(), | ||
| SubjectPrefix: in.SubjectPrefix, | ||
| RotateKey: false, | ||
| }, | ||
| UpdatedAt: now, | ||
| } | ||
|
|
||
| if isUpdate && !in.GetRotateKey() { | ||
| // Update path: keep the existing key + created-at. | ||
| resp.KeyId = existing.GetKeyId() | ||
| resp.CreatedAt = existing.GetCreatedAt() | ||
| } else { | ||
| // First-create or rotate-key: generate a fresh ES256 keypair. | ||
| newKey, err := generateES256KeyMaterial() | ||
| if err != nil { | ||
| return nil, status.Errorf(codes.Internal, "failed to generate signing key: %v", err) | ||
| } | ||
| f.identityKeys[orgID] = newKey | ||
| resp.KeyId = newKey.kid | ||
| if isUpdate { | ||
| resp.CreatedAt = existing.GetCreatedAt() | ||
| } else { | ||
| resp.CreatedAt = now | ||
| } | ||
| } | ||
| f.identityConfigs[orgID] = resp | ||
| return resp, nil |
There was a problem hiding this comment.
Keep the mock identity-config contract aligned with the controller.
SetIdentityConfiguration currently accepts an empty issuer / zero TTL and stores the issuer template verbatim. That lets mock-backed tests succeed on input the real controller is supposed to reject, and it can leak a literal {org} into OIDC discovery and JWT iss output instead of the resolved org-specific issuer. Normalize and validate those fields here before persisting the config.
Suggested fix
func (f *NICoServerImpl) SetIdentityConfiguration(ctx context.Context, req *cwssaws.IdentityConfigRequest) (*cwssaws.IdentityConfigResponse, error) {
if req == nil || req.GetOrganizationId() == "" || req.GetConfig() == nil {
return nil, status.Errorf(codes.InvalidArgument, "Invalid request argument")
}
in := req.GetConfig()
if strings.TrimSpace(in.GetDefaultAudience()) == "" {
return nil, status.Errorf(codes.InvalidArgument, "default_audience is required")
}
+ issuer := strings.ReplaceAll(strings.TrimSpace(in.GetIssuer()), "{org}", req.GetOrganizationId())
+ if issuer == "" {
+ return nil, status.Errorf(codes.InvalidArgument, "issuer is required")
+ }
+ if in.GetTokenTtlSec() == 0 {
+ return nil, status.Errorf(codes.InvalidArgument, "token_ttl_sec is required")
+ }
allowed, err := normalizeAllowedAudiences(in.GetDefaultAudience(), in.GetAllowedAudiences())
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%s", err.Error())
}
orgID := req.GetOrganizationId()
now := timestamppb.Now()
@@
OrganizationId: orgID,
Config: &cwssaws.IdentityConfig{
Enabled: in.GetEnabled(),
- Issuer: in.GetIssuer(),
+ Issuer: issuer,
DefaultAudience: in.GetDefaultAudience(),
AllowedAudiences: allowed,
TokenTtlSec: in.GetTokenTtlSec(),
SubjectPrefix: in.SubjectPrefix,
RotateKey: false,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (f *NICoServerImpl) SetIdentityConfiguration(ctx context.Context, req *cwssaws.IdentityConfigRequest) (*cwssaws.IdentityConfigResponse, error) { | |
| if req == nil || req.GetOrganizationId() == "" || req.GetConfig() == nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "Invalid request argument") | |
| } | |
| in := req.GetConfig() | |
| if strings.TrimSpace(in.GetDefaultAudience()) == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "default_audience is required") | |
| } | |
| allowed, err := normalizeAllowedAudiences(in.GetDefaultAudience(), in.GetAllowedAudiences()) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "%s", err.Error()) | |
| } | |
| orgID := req.GetOrganizationId() | |
| now := timestamppb.Now() | |
| existing, isUpdate := f.identityConfigs[orgID] | |
| resp := &cwssaws.IdentityConfigResponse{ | |
| OrganizationId: orgID, | |
| Config: &cwssaws.IdentityConfig{ | |
| Enabled: in.GetEnabled(), | |
| Issuer: in.GetIssuer(), | |
| DefaultAudience: in.GetDefaultAudience(), | |
| AllowedAudiences: allowed, | |
| TokenTtlSec: in.GetTokenTtlSec(), | |
| SubjectPrefix: in.SubjectPrefix, | |
| RotateKey: false, | |
| }, | |
| UpdatedAt: now, | |
| } | |
| if isUpdate && !in.GetRotateKey() { | |
| // Update path: keep the existing key + created-at. | |
| resp.KeyId = existing.GetKeyId() | |
| resp.CreatedAt = existing.GetCreatedAt() | |
| } else { | |
| // First-create or rotate-key: generate a fresh ES256 keypair. | |
| newKey, err := generateES256KeyMaterial() | |
| if err != nil { | |
| return nil, status.Errorf(codes.Internal, "failed to generate signing key: %v", err) | |
| } | |
| f.identityKeys[orgID] = newKey | |
| resp.KeyId = newKey.kid | |
| if isUpdate { | |
| resp.CreatedAt = existing.GetCreatedAt() | |
| } else { | |
| resp.CreatedAt = now | |
| } | |
| } | |
| f.identityConfigs[orgID] = resp | |
| return resp, nil | |
| func (f *NICoServerImpl) SetIdentityConfiguration(ctx context.Context, req *cwssaws.IdentityConfigRequest) (*cwssaws.IdentityConfigResponse, error) { | |
| if req == nil || req.GetOrganizationId() == "" || req.GetConfig() == nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "Invalid request argument") | |
| } | |
| in := req.GetConfig() | |
| if strings.TrimSpace(in.GetDefaultAudience()) == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "default_audience is required") | |
| } | |
| issuer := strings.ReplaceAll(strings.TrimSpace(in.GetIssuer()), "{org}", req.GetOrganizationId()) | |
| if issuer == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "issuer is required") | |
| } | |
| if in.GetTokenTtlSec() == 0 { | |
| return nil, status.Errorf(codes.InvalidArgument, "token_ttl_sec is required") | |
| } | |
| allowed, err := normalizeAllowedAudiences(in.GetDefaultAudience(), in.GetAllowedAudiences()) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "%s", err.Error()) | |
| } | |
| orgID := req.GetOrganizationId() | |
| now := timestamppb.Now() | |
| existing, isUpdate := f.identityConfigs[orgID] | |
| resp := &cwssaws.IdentityConfigResponse{ | |
| OrganizationId: orgID, | |
| Config: &cwssaws.IdentityConfig{ | |
| Enabled: in.GetEnabled(), | |
| Issuer: issuer, | |
| DefaultAudience: in.GetDefaultAudience(), | |
| AllowedAudiences: allowed, | |
| TokenTtlSec: in.GetTokenTtlSec(), | |
| SubjectPrefix: in.SubjectPrefix, | |
| RotateKey: false, | |
| }, | |
| UpdatedAt: now, | |
| } | |
| if isUpdate && !in.GetRotateKey() { | |
| // Update path: keep the existing key + created-at. | |
| resp.KeyId = existing.GetKeyId() | |
| resp.CreatedAt = existing.GetCreatedAt() | |
| } else { | |
| // First-create or rotate-key: generate a fresh ES256 keypair. | |
| newKey, err := generateES256KeyMaterial() | |
| if err != nil { | |
| return nil, status.Errorf(codes.Internal, "failed to generate signing key: %v", err) | |
| } | |
| f.identityKeys[orgID] = newKey | |
| resp.KeyId = newKey.kid | |
| if isUpdate { | |
| resp.CreatedAt = existing.GetCreatedAt() | |
| } else { | |
| resp.CreatedAt = now | |
| } | |
| } | |
| f.identityConfigs[orgID] = resp | |
| return resp, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@site-workflow/pkg/grpc/server/nico_test_server.go` around lines 1423 - 1473,
SetIdentityConfiguration currently accepts empty issuer or zero TokenTtlSec and
persists the issuer template verbatim; add validation and normalization before
persisting: reject empty issuer and TokenTtlSec <= 0 (return InvalidArgument),
and normalize the issuer template (e.g., via a new helper
normalizeIssuerTemplate or validateIssuerTemplate) to ensure it contains the
required "{org}" placeholder and does not leak a literal "{org}" into stored
Issuer; set resp.Issuer to the normalized value (instead of in.GetIssuer()) and
keep the rest of the flow (calls to normalizeAllowedAudiences,
generateES256KeyMaterial, and updates to f.identityConfigs/f.identityKeys)
unchanged.
| if basic := in.GetClientSecretBasic(); basic != nil { | ||
| if strings.TrimSpace(basic.GetClientId()) == "" { | ||
| return nil, status.Errorf(codes.InvalidArgument, "client_id is required for client_secret_basic") | ||
| } | ||
| resp.AuthMethodConfig = &cwssaws.TokenDelegationResponse_ClientSecretBasic{ | ||
| ClientSecretBasic: &cwssaws.ClientSecretBasicResponse{ | ||
| ClientId: basic.GetClientId(), | ||
| ClientSecretHash: clientSecretDisplayHash(basic.GetClientSecret()), | ||
| }, | ||
| } |
There was a problem hiding this comment.
Reject blank client_secret_basic.clientSecret in the mock server.
When client_secret_basic is present, this path validates client_id but not client_secret. A blank secret is currently accepted and hashed, so workflow/activity tests can go green on input that production should reject as INVALID_ARGUMENT.
Suggested fix
if basic := in.GetClientSecretBasic(); basic != nil {
if strings.TrimSpace(basic.GetClientId()) == "" {
return nil, status.Errorf(codes.InvalidArgument, "client_id is required for client_secret_basic")
}
+ if strings.TrimSpace(basic.GetClientSecret()) == "" {
+ return nil, status.Errorf(codes.InvalidArgument, "client_secret is required for client_secret_basic")
+ }
resp.AuthMethodConfig = &cwssaws.TokenDelegationResponse_ClientSecretBasic{
ClientSecretBasic: &cwssaws.ClientSecretBasicResponse{
ClientId: basic.GetClientId(),
ClientSecretHash: clientSecretDisplayHash(basic.GetClientSecret()),
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if basic := in.GetClientSecretBasic(); basic != nil { | |
| if strings.TrimSpace(basic.GetClientId()) == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "client_id is required for client_secret_basic") | |
| } | |
| resp.AuthMethodConfig = &cwssaws.TokenDelegationResponse_ClientSecretBasic{ | |
| ClientSecretBasic: &cwssaws.ClientSecretBasicResponse{ | |
| ClientId: basic.GetClientId(), | |
| ClientSecretHash: clientSecretDisplayHash(basic.GetClientSecret()), | |
| }, | |
| } | |
| if basic := in.GetClientSecretBasic(); basic != nil { | |
| if strings.TrimSpace(basic.GetClientId()) == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "client_id is required for client_secret_basic") | |
| } | |
| if strings.TrimSpace(basic.GetClientSecret()) == "" { | |
| return nil, status.Errorf(codes.InvalidArgument, "client_secret is required for client_secret_basic") | |
| } | |
| resp.AuthMethodConfig = &cwssaws.TokenDelegationResponse_ClientSecretBasic{ | |
| ClientSecretBasic: &cwssaws.ClientSecretBasicResponse{ | |
| ClientId: basic.GetClientId(), | |
| ClientSecretHash: clientSecretDisplayHash(basic.GetClientSecret()), | |
| }, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@site-workflow/pkg/grpc/server/nico_test_server.go` around lines 1530 - 1539,
The mock server path handling in the branch where in.GetClientSecretBasic() !=
nil validates client_id but fails to reject an empty client_secret; add a
validation that checks strings.TrimSpace(basic.GetClientSecret()) and if empty
return status.Errorf(codes.InvalidArgument, "client_secret is required for
client_secret_basic"); keep the existing response construction
(cwssaws.TokenDelegationResponse_ClientSecretBasic and
cwssaws.ClientSecretBasicResponse) and continue to use
clientSecretDisplayHash(basic.GetClientSecret()) only after the new non-empty
check passes.
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
…wasn't a requirement by SDD nor required by the standards Signed-off-by: Parham Armani <parmani@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openapi/spec.yaml`:
- Around line 21592-21599: The JWKS schema currently defines the keys property
but does not mark it required; update the JWKS schema (symbol: JWKS) to require
the keys property so that generated clients/validators get {"keys": []} when
empty—add a required: ["keys"] entry under the JWKS object and ensure the
existing keys property remains an array (items as currently defined) so the
contract stays consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ab7d0f91-f688-411a-9141-1c1aa44184f2
📒 Files selected for processing (2)
docs/index.htmlopenapi/spec.yaml
| JWKS: | ||
| title: JWKS | ||
| description: RFC 7517 JSON Web Key Set. Pass-through from the site controller. | ||
| type: object | ||
| properties: | ||
| keys: | ||
| type: array | ||
| items: |
There was a problem hiding this comment.
Require JWKS.keys to match the documented constant-shape response.
At Line [21597], keys is defined but not required. The path docs promise {"keys":[]} when empty, so the schema should require keys to keep generated clients and validators aligned with the contract.
Proposed fix
JWKS:
title: JWKS
description: RFC 7517 JSON Web Key Set. Pass-through from the site controller.
type: object
+ required:
+ - keys
properties:
keys:
type: arrayAs per coding guidelines openapi/spec.yaml: Review the OpenAPI specification, check for consistency and correctness.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| JWKS: | |
| title: JWKS | |
| description: RFC 7517 JSON Web Key Set. Pass-through from the site controller. | |
| type: object | |
| properties: | |
| keys: | |
| type: array | |
| items: | |
| JWKS: | |
| title: JWKS | |
| description: RFC 7517 JSON Web Key Set. Pass-through from the site controller. | |
| type: object | |
| required: | |
| - keys | |
| properties: | |
| keys: | |
| type: array | |
| items: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openapi/spec.yaml` around lines 21592 - 21599, The JWKS schema currently
defines the keys property but does not mark it required; update the JWKS schema
(symbol: JWKS) to require the keys property so that generated clients/validators
get {"keys": []} when empty—add a required: ["keys"] entry under the JWKS object
and ensure the existing keys property remains an array (items as currently
defined) so the contract stays consistent.
Signed-off-by: Parham Armani <parmani@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/identity.go`:
- Around line 127-131: The code currently treats any error from
common.GetSiteFromIDString as a client/absent condition; change it to check for
the repository sentinel cdb.ErrDoesNotExist (using direct equality) and preserve
the existing 400/404-style response for that case, but for any other non-nil err
return a 500-style response (use http.StatusInternalServerError via
cutil.NewAPIErrorResponse), and ensure the logger records the full error and
context; apply the same change to both call sites that use GetSiteFromIDString
(the blocks that set site, err and return with logger, span, ctx,
cutil.NewAPIErrorResponse).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6b4a87aa-4907-40b1-9557-4450a9bd6f18
📒 Files selected for processing (2)
api/pkg/api/handler/identity.goapi/pkg/api/handler/identity_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/pkg/api/handler/identity_test.go
| site, err = common.GetSiteFromIDString(ctx, nil, siteID, dbSession) | ||
| if err != nil { | ||
| logger.Warn().Err(err).Str("Site ID", siteID).Msg("error getting site from request") | ||
| return "", nil, nil, span, logger, ctx, cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Error retrieving Site in request", nil) | ||
| } |
There was a problem hiding this comment.
Differentiate “site absent” from real site-lookup failures.
Line 127-131 and Line 170-174 treat every GetSiteFromIDString failure as client/absent state. That can mask DB/backend errors as 400, 404, or empty JWKS, which is misleading during outages.
💡 Suggested fix
@@
- site, err = common.GetSiteFromIDString(ctx, nil, siteID, dbSession)
- if err != nil {
- logger.Warn().Err(err).Str("Site ID", siteID).Msg("error getting site from request")
- return "", nil, nil, span, logger, ctx, cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Error retrieving Site in request", nil)
- }
+ site, err = common.GetSiteFromIDString(ctx, nil, siteID, dbSession)
+ if err != nil {
+ if err == cdb.ErrDoesNotExist {
+ logger.Warn().Err(err).Str("Site ID", siteID).Msg("site not found")
+ return "", nil, nil, span, logger, ctx, cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Error retrieving Site in request", nil)
+ }
+ logger.Error().Err(err).Str("Site ID", siteID).Msg("failed to retrieve Site")
+ return "", nil, nil, span, logger, ctx, cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve Site", nil)
+ }
@@
- site, err := common.GetSiteFromIDString(ctx, nil, siteID, dbSession)
- if err != nil {
- logger.Warn().Err(err).Msg("error getting Site from request")
- return "", nil, nil, logger, ctx, errPublicIdentityAbsent
- }
+ site, err := common.GetSiteFromIDString(ctx, nil, siteID, dbSession)
+ if err != nil {
+ if err == cdb.ErrDoesNotExist {
+ logger.Warn().Err(err).Msg("site not found in public identity lookup")
+ return "", nil, nil, logger, ctx, errPublicIdentityAbsent
+ }
+ logger.Error().Err(err).Msg("failed to retrieve Site for public identity lookup")
+ return "", nil, nil, logger, ctx, cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve Site", nil)
+ }Based on learnings: cdb.ErrDoesNotExist is a non-wrapped DB sentinel in this repository, so direct equality checks are intentional and safe.
Also applies to: 170-174
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/identity.go` around lines 127 - 131, The code currently
treats any error from common.GetSiteFromIDString as a client/absent condition;
change it to check for the repository sentinel cdb.ErrDoesNotExist (using direct
equality) and preserve the existing 400/404-style response for that case, but
for any other non-nil err return a 500-style response (use
http.StatusInternalServerError via cutil.NewAPIErrorResponse), and ensure the
logger records the full error and context; apply the same change to both call
sites that use GetSiteFromIDString (the blocks that set site, err and return
with logger, span, ctx, cutil.NewAPIErrorResponse).
Adds a REST surface that proxies carbide-core Forge so Tenant Admins can manage per-tenant JWT-SVID signing
Under
/v2/org/{org}/carbide/site/{siteID}:FORGE_TENANT_ADMIN, org from URL):PUT|GET|DELETE /identity/config,PUT|GET|DELETE /identity/token-delegation.GET /.well-known/jwks.json,GET /.well-known/openid-configuration,GET /.well-known/spiffe/jwks.json.Implementation spans:
API: handlers, models, routes, and config in
api/(notablyapi/pkg/api/handler/identity.go,api/pkg/api/model/identity.go,api/pkg/api/routes.go,api/internal/config/).Site Workflow: identity workflow and activities that talk to Forge gRPC from activities (
site-workflow/pkg/workflow/identity.go,site-workflow/pkg/activity/identity.go).Forge gRPC mocks for tests: new in-memory Forge test server and client test helpers (
site-workflow/pkg/grpc/server/forge_test_server.go,site-workflow/pkg/grpc/client/testing.go) so identity activities and workflows can be exercised without a real carbide-core.Site Agent: registers the new
MachineIdentitymanager on the Temporal worker (site-agent/pkg/components/managers/identity/,managerapi/machineidentity_api.go,manager.go, orchestrator wiring).OpenAPI + CarbideCLI: new
Machine Identitytag inopenapi/spec.yaml(with regenerateddocs/index.html) and matching CLI/TUI command coverage incli/tui/commands.go,cli/tui/repl.go, plus CLI tests (cli/pkg/commands_test.go,cli/pkg/spec_test.go) so the generated CLI exposes the new endpoints out of the box.Docs/scripts: supporting material under
docs/machine-identity/and optional shell harnessesscripts/machine-identity-e2e.shandscripts/machine-identity-test-matrix.sh.Feature - New feature or functionality (feat:)
Fix - Bug fixes (fix:)
Chore - Modification or removal of existing functionality (chore:)
Refactor - Refactoring of existing functionality (refactor:)
Docs - Changes in documentation or OpenAPI schema (docs:)
CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
Version - Issuing a new release version (version:)
API - API models or endpoints updated
Workflow - Workflow service updated
DB - DB DAOs or migrations updated
Site Manager - Site Manager updated
Cert Manager - Cert Manager updated
Site Agent - Site Agent updated
RLA - RLA service updated
Powershelf Manager - Powershelf Manager updated
NVSwitch Manager - NVSwitch Manager updated
This PR contains breaking changes
Unit tests added/updated
Integration tests added/updated
Manual testing performed
No testing required (docs, internal refactor, etc.)
New/expanded unit coverage:
api/pkg/api/handler/identity_test.go,api/pkg/api/model/identity_test.go,api/internal/config/config_test.gofor handler/model/config behavior.api/pkg/api/routes_test.goupdated to count the six new/identity/configand/identity/token-delegationroutes in the authenticated tier, and a newTestNewWellKnownRoutesenumerates the public.well-known/{jwks.json,openid-configuration,spiffe/jwks.json}routes by exact(path, method)so any accidental promotion of an unauthenticated route into the auth-protected group fails loudly.site-workflow/pkg/activity/identity_test.gofor the Forge-backed activities, driven by the new in-memory Forge mock (site-workflow/pkg/grpc/server/forge_test_server.go,site-workflow/pkg/grpc/client/testing.go).cli/pkg/commands_test.go/cli/pkg/spec_test.goto verify the generated CLI surface.Auth uses the tenant org from the URL; the site controller remains authoritative for config presence (
NOT_FOUNDwhen missing).Per-site defaults (
machineIdentity.defaults.<siteID>) can supplyissuer(with{org}placeholder),tokenTtlSec, andsubjectPrefix; missing required fields return400.Identity config
PUTuses full-resolutionCreatedAt/UpdatedAtcomparison to decide201vs200.Token-delegation
GETexposes only a SHA-256 hash of the client secret — the secret is never returned after creation.Handlers use the same synchronous workflow pattern as other handlers in this package (
ExecuteWorkflow+ typedGet, timeout termination,UnwrapWorkflowError) rather thanExecuteSyncWorkflow, so responses and runaway-workflow cleanup stay consistent with the rest of the API.The well-known routes live in a separate
NewWellKnownRoutesfunction (api/pkg/api/routes.go) and are mounted on the root echo with the version prefix, before the versionedrouteGroup's auth middleware inapi/internal/server/server.go. Keeping them out ofNewAPIRoutesis what allows JWT verifiers without credentials to fetch JWKS/OIDC metadata, and the dedicatedTestNewWellKnownRoutesguards that boundary.The
db/cmd/migrations/migrationsbinary in the diff is a build artifact and should be removed before merge (no schema changes belong to this PR; leave DB unchecked).CarbideCLI: the new
Machine IdentityOpenAPI tag means the generated CLI/TUI picks upidentity/config,identity/token-delegation, and the.well-knownreads automatically; new TUI command wiring lives incli/tui/commands.go/cli/tui/repl.go.