Skip to content

feat: add cluster networking cache constants and struct fields#720

Merged
ArangoGutierrez merged 2 commits intoNVIDIA:mainfrom
ArangoGutierrez:feat/prod-clusters-cache
Mar 13, 2026
Merged

feat: add cluster networking cache constants and struct fields#720
ArangoGutierrez merged 2 commits intoNVIDIA:mainfrom
ArangoGutierrez:feat/prod-clusters-cache

Conversation

@ArangoGutierrez
Copy link
Copy Markdown
Collaborator

Summary

  • Adds 8 new cache constant keys for cluster networking resources: public-subnet-id, nat-gateway-id, private-route-table-id, public-route-table-id, cp-security-group-id, worker-security-group-id, eip-allocation-id, iam-instance-profile-arn
  • Adds corresponding struct fields to AWS struct so ClusterCache (which embeds AWS) inherits them automatically
  • Updates dumpCache(), unmarsalCache(), and updateStatus() to persist and restore all new fields -- ensures delete can clean up every resource

Architecture Decision: D5 -- shared cache layer enables all subsequent cluster networking tasks to persist their resource IDs without further cache changes.

Backward Compatibility: All existing single-node fields (SecurityGroupid, RouteTable, Subnetid) are preserved unchanged. Single-node mode is unaffected.

Test plan

  • TestCacheRoundTrip -- verifies all 15 fields (8 original + 7 new) survive a dumpCache -> unmarsalCache round-trip
  • TestCacheRoundTripSingleNode -- verifies single-node backward compatibility (new fields default to empty strings)
  • go test -count=1 ./pkg/provider/aws/... passes

@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review March 12, 2026 17:47
Copilot AI review requested due to automatic review settings March 12, 2026 17:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the AWS provider’s status/cache surface area to support upcoming cluster networking work by adding new cache keys and corresponding AWS struct fields, plus tests to verify cache persistence.

Changes:

  • Added new cache key constants and AWS struct fields for cluster networking resource identifiers.
  • Updated cache dump/unmarshal and status property update logic to include the new fields.
  • Added cache round-trip unit tests for the expanded AWS cache payload.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pkg/provider/aws/aws.go Adds new cache key constants and AWS struct fields; updates dump/unmarshal logic.
pkg/provider/aws/status.go Extends status property initialization/update to include new cache-backed fields.
pkg/provider/aws/cache_test.go Adds unit tests verifying cache round-trip persistence and single-node backward compatibility.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +167 to +168
case PrivateRouteTable:
aws.RouteTable = p.Value
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrivateRouteTable is introduced as a distinct cache key, but it is not persisted in dumpCache() and there is no dedicated struct field for it. Additionally, unmarsalCache() maps PrivateRouteTable into aws.RouteTable, which can silently overwrite the existing route-table-id value if both keys are present. Either remove PrivateRouteTable entirely (if RouteTable is meant to be the private route table), or add a separate PrivateRouteTable field and persist/restore it consistently (and update status/cache tests accordingly).

Suggested change
case PrivateRouteTable:
aws.RouteTable = p.Value

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +138
case PublicSubnetID:
if p.Value != cache.PublicSubnetid {
p.Value = cache.PublicSubnetid
modified = true
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for _, p := range envCopy.Status.Properties loop ranges by value, so assignments like p.Value = ... only update a copy and do not mutate envCopy.Status.Properties. This means new fields (e.g., PublicSubnetID, NatGatewayID, etc.) will not actually be persisted when properties already exist, even though modified becomes true. Iterate by index (e.g., for i := range ...) or take pointers so the slice elements are updated before calling update().

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +107
// Verify every field survives the round-trip
assertions := []struct {
name string
got string
expected string
}{
{"Vpcid", restored.Vpcid, original.Vpcid},
{"Subnetid", restored.Subnetid, original.Subnetid},
{"InternetGwid", restored.InternetGwid, original.InternetGwid},
{"InternetGatewayAttachment", restored.InternetGatewayAttachment, original.InternetGatewayAttachment},
{"RouteTable", restored.RouteTable, original.RouteTable},
{"SecurityGroupid", restored.SecurityGroupid, original.SecurityGroupid},
{"Instanceid", restored.Instanceid, original.Instanceid},
{"PublicDnsName", restored.PublicDnsName, original.PublicDnsName},
{"PublicSubnetid", restored.PublicSubnetid, original.PublicSubnetid},
{"NatGatewayid", restored.NatGatewayid, original.NatGatewayid},
{"PublicRouteTable", restored.PublicRouteTable, original.PublicRouteTable},
{"CPSecurityGroupid", restored.CPSecurityGroupid, original.CPSecurityGroupid},
{"WorkerSecurityGroupid", restored.WorkerSecurityGroupid, original.WorkerSecurityGroupid},
{"EIPAllocationid", restored.EIPAllocationid, original.EIPAllocationid},
{"IAMInstanceProfileArn", restored.IAMInstanceProfileArn, original.IAMInstanceProfileArn},
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round-trip test asserts only the struct fields, but it does not cover the newly added PrivateRouteTable cache key at all. Given PrivateRouteTable exists in aws.go, this test suite will not catch regressions where that key is dropped or incorrectly mapped; consider adding coverage for it (or removing the constant if it isn't intended to be supported).

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +84
// TestCacheRoundTrip verifies that all AWS struct fields survive a
// dumpCache -> unmarsalCache round-trip. If a field is not persisted,
// delete will fail to clean up the corresponding resource.
func TestCacheRoundTrip(t *testing.T) {
tmpDir, err := os.MkdirTemp("", "cache-roundtrip-*")
if err != nil {
t.Fatalf("failed to create temp dir: %v", err)
}
defer func() { _ = os.RemoveAll(tmpDir) }()

cacheFile := filepath.Join(tmpDir, "cache.yaml")

env := &v1alpha1.Environment{
ObjectMeta: metav1.ObjectMeta{Name: "test-env"},
Spec: v1alpha1.EnvironmentSpec{
Provider: v1alpha1.ProviderAWS,
Instance: v1alpha1.Instance{
Region: "us-west-2",
},
},
}

client := &Client{
cacheFile: cacheFile,
Environment: env,
}

original := &AWS{
Vpcid: "vpc-abc123",
Subnetid: "subnet-def456",
InternetGwid: "igw-ghi789",
InternetGatewayAttachment: "vpc-abc123",
RouteTable: "rtb-jkl012",
SecurityGroupid: "sg-mno345",
Instanceid: "i-pqr678",
PublicDnsName: "ec2-1-2-3-4.compute.amazonaws.com",
// New cluster networking fields
PublicSubnetid: "subnet-pub789",
NatGatewayid: "nat-abc123",
PublicRouteTable: "rtb-pub456",
CPSecurityGroupid: "sg-cp789",
WorkerSecurityGroupid: "sg-worker012",
EIPAllocationid: "eipalloc-345",
IAMInstanceProfileArn: "arn:aws:iam::123456789012:instance-profile/holodeck",
}

// Dump the cache
if err := client.dumpCache(original); err != nil {
t.Fatalf("dumpCache failed: %v", err)
}

// Read it back
restored, err := client.unmarsalCache()
if err != nil {
t.Fatalf("unmarsalCache failed: %v", err)
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: the comment and error messages say unmarsalCache/unmarsalCache failed, but the correct term is "unmarshal". Even if the function name stays as-is for compatibility, the user-facing strings/comments should use the correct spelling.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +55
// Cluster networking cache keys
PublicSubnetID string = "public-subnet-id"
NatGatewayID string = "nat-gateway-id"
PrivateRouteTable string = "private-route-table-id"
PublicRouteTable string = "public-route-table-id"
CPSecurityGroupID string = "cp-security-group-id"
WorkerSecurityGroupID string = "worker-security-group-id"
EIPAllocationID string = "eip-allocation-id"
IAMInstanceProfileArn string = "iam-instance-profile-arn"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description states 8 new cluster networking cache keys are persisted/restored, but in this change set private-route-table-id is not written in dumpCache()/updateStatus(), and there is no dedicated struct field for it (it is instead mapped onto RouteTable during unmarshal). Either update the implementation to truly support all 8 keys, or adjust the PR description to match what is actually persisted.

Copilot uses AI. Check for mistakes.
Add shared cache constants and AWS struct fields required for
production-grade cluster networking (D5 architecture decision).

New cache keys: public-subnet-id, nat-gateway-id, private-route-table-id,
public-route-table-id, cp-security-group-id, worker-security-group-id,
eip-allocation-id, iam-instance-profile-arn.

Updated dumpCache(), unmarsalCache(), and updateStatus() to persist and
restore all new fields. Backward compatible: existing single-node fields
are preserved unchanged.

Includes round-trip test verifying all fields survive dumpCache ->
unmarsalCache cycle for both cluster and single-node modes.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Fix gofmt alignment in constants and struct literals. Fix goimports
grouping in cache_test.go. Fix PrivateRouteTable unmarshal case that
was a no-op — it now correctly maps to the existing RouteTable field.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez force-pushed the feat/prod-clusters-cache branch from eef309e to e824c0e Compare March 12, 2026 18:32
@ArangoGutierrez ArangoGutierrez merged commit a0a995c into NVIDIA:main Mar 13, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants