feat: add cluster networking cache constants and struct fields#720
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the AWS provider’s status/cache surface area to support upcoming cluster networking work by adding new cache keys and corresponding AWS struct fields, plus tests to verify cache persistence.
Changes:
- Added new cache key constants and
AWSstruct fields for cluster networking resource identifiers. - Updated cache dump/unmarshal and status property update logic to include the new fields.
- Added cache round-trip unit tests for the expanded
AWScache payload.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
pkg/provider/aws/aws.go |
Adds new cache key constants and AWS struct fields; updates dump/unmarshal logic. |
pkg/provider/aws/status.go |
Extends status property initialization/update to include new cache-backed fields. |
pkg/provider/aws/cache_test.go |
Adds unit tests verifying cache round-trip persistence and single-node backward compatibility. |
You can also share your feedback on Copilot code review. Take the survey.
| case PrivateRouteTable: | ||
| aws.RouteTable = p.Value |
There was a problem hiding this comment.
PrivateRouteTable is introduced as a distinct cache key, but it is not persisted in dumpCache() and there is no dedicated struct field for it. Additionally, unmarsalCache() maps PrivateRouteTable into aws.RouteTable, which can silently overwrite the existing route-table-id value if both keys are present. Either remove PrivateRouteTable entirely (if RouteTable is meant to be the private route table), or add a separate PrivateRouteTable field and persist/restore it consistently (and update status/cache tests accordingly).
| case PrivateRouteTable: | |
| aws.RouteTable = p.Value |
| case PublicSubnetID: | ||
| if p.Value != cache.PublicSubnetid { | ||
| p.Value = cache.PublicSubnetid | ||
| modified = true | ||
| } |
There was a problem hiding this comment.
The for _, p := range envCopy.Status.Properties loop ranges by value, so assignments like p.Value = ... only update a copy and do not mutate envCopy.Status.Properties. This means new fields (e.g., PublicSubnetID, NatGatewayID, etc.) will not actually be persisted when properties already exist, even though modified becomes true. Iterate by index (e.g., for i := range ...) or take pointers so the slice elements are updated before calling update().
| // Verify every field survives the round-trip | ||
| assertions := []struct { | ||
| name string | ||
| got string | ||
| expected string | ||
| }{ | ||
| {"Vpcid", restored.Vpcid, original.Vpcid}, | ||
| {"Subnetid", restored.Subnetid, original.Subnetid}, | ||
| {"InternetGwid", restored.InternetGwid, original.InternetGwid}, | ||
| {"InternetGatewayAttachment", restored.InternetGatewayAttachment, original.InternetGatewayAttachment}, | ||
| {"RouteTable", restored.RouteTable, original.RouteTable}, | ||
| {"SecurityGroupid", restored.SecurityGroupid, original.SecurityGroupid}, | ||
| {"Instanceid", restored.Instanceid, original.Instanceid}, | ||
| {"PublicDnsName", restored.PublicDnsName, original.PublicDnsName}, | ||
| {"PublicSubnetid", restored.PublicSubnetid, original.PublicSubnetid}, | ||
| {"NatGatewayid", restored.NatGatewayid, original.NatGatewayid}, | ||
| {"PublicRouteTable", restored.PublicRouteTable, original.PublicRouteTable}, | ||
| {"CPSecurityGroupid", restored.CPSecurityGroupid, original.CPSecurityGroupid}, | ||
| {"WorkerSecurityGroupid", restored.WorkerSecurityGroupid, original.WorkerSecurityGroupid}, | ||
| {"EIPAllocationid", restored.EIPAllocationid, original.EIPAllocationid}, | ||
| {"IAMInstanceProfileArn", restored.IAMInstanceProfileArn, original.IAMInstanceProfileArn}, | ||
| } |
There was a problem hiding this comment.
The round-trip test asserts only the struct fields, but it does not cover the newly added PrivateRouteTable cache key at all. Given PrivateRouteTable exists in aws.go, this test suite will not catch regressions where that key is dropped or incorrectly mapped; consider adding coverage for it (or removing the constant if it isn't intended to be supported).
pkg/provider/aws/cache_test.go
Outdated
| // TestCacheRoundTrip verifies that all AWS struct fields survive a | ||
| // dumpCache -> unmarsalCache round-trip. If a field is not persisted, | ||
| // delete will fail to clean up the corresponding resource. | ||
| func TestCacheRoundTrip(t *testing.T) { | ||
| tmpDir, err := os.MkdirTemp("", "cache-roundtrip-*") | ||
| if err != nil { | ||
| t.Fatalf("failed to create temp dir: %v", err) | ||
| } | ||
| defer func() { _ = os.RemoveAll(tmpDir) }() | ||
|
|
||
| cacheFile := filepath.Join(tmpDir, "cache.yaml") | ||
|
|
||
| env := &v1alpha1.Environment{ | ||
| ObjectMeta: metav1.ObjectMeta{Name: "test-env"}, | ||
| Spec: v1alpha1.EnvironmentSpec{ | ||
| Provider: v1alpha1.ProviderAWS, | ||
| Instance: v1alpha1.Instance{ | ||
| Region: "us-west-2", | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| client := &Client{ | ||
| cacheFile: cacheFile, | ||
| Environment: env, | ||
| } | ||
|
|
||
| original := &AWS{ | ||
| Vpcid: "vpc-abc123", | ||
| Subnetid: "subnet-def456", | ||
| InternetGwid: "igw-ghi789", | ||
| InternetGatewayAttachment: "vpc-abc123", | ||
| RouteTable: "rtb-jkl012", | ||
| SecurityGroupid: "sg-mno345", | ||
| Instanceid: "i-pqr678", | ||
| PublicDnsName: "ec2-1-2-3-4.compute.amazonaws.com", | ||
| // New cluster networking fields | ||
| PublicSubnetid: "subnet-pub789", | ||
| NatGatewayid: "nat-abc123", | ||
| PublicRouteTable: "rtb-pub456", | ||
| CPSecurityGroupid: "sg-cp789", | ||
| WorkerSecurityGroupid: "sg-worker012", | ||
| EIPAllocationid: "eipalloc-345", | ||
| IAMInstanceProfileArn: "arn:aws:iam::123456789012:instance-profile/holodeck", | ||
| } | ||
|
|
||
| // Dump the cache | ||
| if err := client.dumpCache(original); err != nil { | ||
| t.Fatalf("dumpCache failed: %v", err) | ||
| } | ||
|
|
||
| // Read it back | ||
| restored, err := client.unmarsalCache() | ||
| if err != nil { | ||
| t.Fatalf("unmarsalCache failed: %v", err) | ||
| } |
There was a problem hiding this comment.
Spelling: the comment and error messages say unmarsalCache/unmarsalCache failed, but the correct term is "unmarshal". Even if the function name stays as-is for compatibility, the user-facing strings/comments should use the correct spelling.
| // Cluster networking cache keys | ||
| PublicSubnetID string = "public-subnet-id" | ||
| NatGatewayID string = "nat-gateway-id" | ||
| PrivateRouteTable string = "private-route-table-id" | ||
| PublicRouteTable string = "public-route-table-id" | ||
| CPSecurityGroupID string = "cp-security-group-id" | ||
| WorkerSecurityGroupID string = "worker-security-group-id" | ||
| EIPAllocationID string = "eip-allocation-id" | ||
| IAMInstanceProfileArn string = "iam-instance-profile-arn" |
There was a problem hiding this comment.
PR description states 8 new cluster networking cache keys are persisted/restored, but in this change set private-route-table-id is not written in dumpCache()/updateStatus(), and there is no dedicated struct field for it (it is instead mapped onto RouteTable during unmarshal). Either update the implementation to truly support all 8 keys, or adjust the PR description to match what is actually persisted.
Add shared cache constants and AWS struct fields required for production-grade cluster networking (D5 architecture decision). New cache keys: public-subnet-id, nat-gateway-id, private-route-table-id, public-route-table-id, cp-security-group-id, worker-security-group-id, eip-allocation-id, iam-instance-profile-arn. Updated dumpCache(), unmarsalCache(), and updateStatus() to persist and restore all new fields. Backward compatible: existing single-node fields are preserved unchanged. Includes round-trip test verifying all fields survive dumpCache -> unmarsalCache cycle for both cluster and single-node modes. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Fix gofmt alignment in constants and struct literals. Fix goimports grouping in cache_test.go. Fix PrivateRouteTable unmarshal case that was a no-op — it now correctly maps to the existing RouteTable field. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
eef309e to
e824c0e
Compare
Summary
public-subnet-id,nat-gateway-id,private-route-table-id,public-route-table-id,cp-security-group-id,worker-security-group-id,eip-allocation-id,iam-instance-profile-arnAWSstruct soClusterCache(which embedsAWS) inherits them automaticallydumpCache(),unmarsalCache(), andupdateStatus()to persist and restore all new fields -- ensures delete can clean up every resourceArchitecture Decision: D5 -- shared cache layer enables all subsequent cluster networking tasks to persist their resource IDs without further cache changes.
Backward Compatibility: All existing single-node fields (
SecurityGroupid,RouteTable,Subnetid) are preserved unchanged. Single-node mode is unaffected.Test plan
TestCacheRoundTrip-- verifies all 15 fields (8 original + 7 new) survive adumpCache->unmarsalCacheround-tripTestCacheRoundTripSingleNode-- verifies single-node backward compatibility (new fields default to empty strings)go test -count=1 ./pkg/provider/aws/...passes