-
Notifications
You must be signed in to change notification settings - Fork 25
[VC-46370] CyberArk: Skip deleted resources when converting data readings to snapshot #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ings to snapshot - Skip resources marked deleted when extracting from DataReadings - Append only non-deleted runtime.Objects to avoid nil entries - Document that extractor functions exclude deleted resources - Update tests and example input to assert deleted resources are ignored - Document the motivation for the caching mechanism and its faults Signed-off-by: Richard Wall <richard.wall@cyberark.com>
cc0799c to
ca9ae2e
Compare
| "name": "deleted-secret-1", | ||
| "namespace": "team-2" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is used by the tests in cmd/agent_test.go.
jetstack-secure/cmd/agent_test.go
Lines 34 to 41 in ca9ae2e
| t.Run("machinehub", func(t *testing.T) { | |
| arktesting.SkipIfNoEnv(t) | |
| runSubprocess(t, repoRoot, []string{ | |
| "--agent-config-file", filepath.Join(repoRoot, "examples/machinehub/config.yaml"), | |
| "--input-path", filepath.Join(repoRoot, "examples/machinehub/input.json"), | |
| "--machine-hub", | |
| }) | |
| }) |
By capturing the HTTP request using mitmproxy we can see the deleted items are included in the snapshot before the changes to the implementation and absent after the implementation change:
mitmproxy
make test-unit HTTPS_PROXY=localhost:8080
BEFORE:
tail -1 request.txt | jq
{
"agent_version": "development",
"cluster_id": "0e069229-d83b-4075-a4c8-95838ff5c437",
"cluster_name": "github-jetstack-secure-tests@cyberark.cloud.420375",
"k8s_version": "v1.27.6",
"secrets": [
{
"apiVersion": "v1",
"kind": "Secret",
"metadata": {
"name": "app-1-secret-1",
"namespace": "team-1"
}
},
{
"apiVersion": "v1",
"kind": "Secret",
"metadata": {
"name": "deleted-secret-1",
"namespace": "team-2"
}
}
],
"serviceaccounts": [],
"roles": [],
"clusterroles": [],
"rolebindings": [],
"clusterrolebindings": [],
"jobs": [],
"cronjobs": [],
"deployments": [],
"statefulsets": [],
"daemonsets": [],
"pods": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "app-1-pod-1",
"namespace": "team-1"
}
},
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "deleted-pod-1",
"namespace": "team-2"
}
}
]
}
AFTER:
tail -1 request-after.txt | jq
{
"agent_version": "development",
"cluster_id": "0e069229-d83b-4075-a4c8-95838ff5c437",
"cluster_name": "github-jetstack-secure-tests@cyberark.cloud.420375",
"k8s_version": "v1.27.6",
"secrets": [
{
"apiVersion": "v1",
"kind": "Secret",
"metadata": {
"name": "app-1-secret-1",
"namespace": "team-1"
}
}
],
"serviceaccounts": [],
"roles": [],
"clusterroles": [],
"rolebindings": [],
"clusterrolebindings": [],
"jobs": [],
"cronjobs": [],
"deployments": [],
"statefulsets": [],
"daemonsets": [],
"pods": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "app-1-pod-1",
"namespace": "team-1"
}
}
]
}
| }, | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of how this test fails before the implementation change.
=== FAIL: pkg/client TestExtractResourceListFromReading/happy_path (0.00s)
client_cyberark_convertdatareadings_test.go:251:
Error Trace: /home/richard/projects/jetstack/jetstack-secure/pkg/client/client_cyberark_convertdatareadings_test.go:251
Error: "[0xc00051a028 0xc00051a030 0xc00051a038]" should have 2 item(s), but has 3
Test: TestExtractResourceListFromReading/happy_path
| Namespace: "team-1", | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of how this test fails before the implementation change:
=== FAIL: pkg/client TestConvertDataReadings/happy_path (0.00s)
client_cyberark_convertdatareadings_test.go:372:
Error Trace: /home/richard/projects/jetstack/jetstack-secure/pkg/client/client_cyberark_convertdatareadings_test.go:372
Error: Not equal:
expected: dataupload.Snapshot{AgentVersion:"", ClusterID:"success-cluster-id", ClusterName:"", ClusterDescription:"", K8SVersion:"v1.21.0", Secrets:[]runtime.Object{(*v1.Secret)(0xc000050780)}, ServiceAccounts:[]runtime.Object(nil), Roles:[]runtime.Object(nil), ClusterRoles:[]runtime.Object(nil), RoleBindings:[]runtime.Object(nil), ClusterRoleBindings:[]runtime.Object(nil), Jobs:[]runtime.Object(nil), CronJobs:[]runtime.Object(nil), Deployments:[]runtime.Object(nil), Statefulsets:[]runtime.Object(nil), Daemonsets:[]runtime.Object(nil), Pods:[]runtime.Object(nil)}
actual : dataupload.Snapshot{AgentVersion:"", ClusterID:"success-cluster-id", ClusterName:"", ClusterDescription:"", K8SVersion:"v1.21.0", Secrets:[]runtime.Object{(*v1.Secret)(0xc000050500), (*v1.Secret)(0xc000050640)}, ServiceAccounts:[]runtime.Object(nil), Roles:[]runtime.Object(nil), ClusterRoles:[]runtime.Object(nil), RoleBindings:[]runtime.Object(nil), ClusterRoleBindings:[]runtime.Object(nil), Jobs:[]runtime.Object(nil), CronJobs:[]runtime.Object(nil), Deployments:[]runtime.Object(nil), Statefulsets:[]runtime.Object(nil), Daemonsets:[]runtime.Object(nil), Pods:[]runtime.Object(nil)}
Diff:
--- Expected
+++ Actual
@@ -6,3 +6,3 @@
K8SVersion: (string) (len=7) "v1.21.0",
- Secrets: ([]runtime.Object) (len=1) {
+ Secrets: ([]runtime.Object) (len=2) {
(*v1.Secret)({
@@ -14,2 +14,35 @@
Name: (string) (len=5) "app-1",
+ GenerateName: (string) "",
+ Namespace: (string) (len=6) "team-1",
+ SelfLink: (string) "",
+ UID: (types.UID) "",
+ ResourceVersion: (string) "",
+ Generation: (int64) 0,
+ CreationTimestamp: (v1.Time) {
+ Time: (time.Time) {
+ wall: (uint64) 0,
+ ext: (int64) 0,
+ loc: (*time.Location)(<nil>)
+ }
+ },
+ DeletionTimestamp: (*v1.Time)(<nil>),
+ DeletionGracePeriodSeconds: (*int64)(<nil>),
+ Labels: (map[string]string) <nil>,
+ Annotations: (map[string]string) <nil>,
+ OwnerReferences: ([]v1.OwnerReference) <nil>,
+ Finalizers: ([]string) <nil>,
+ ManagedFields: ([]v1.ManagedFieldsEntry) <nil>
+ },
+ Immutable: (*bool)(<nil>),
+ Data: (map[string][]uint8) <nil>,
+ StringData: (map[string]string) <nil>,
+ Type: (v1.SecretType) ""
+ }),
+ (*v1.Secret)({
+ TypeMeta: (v1.TypeMeta) {
+ Kind: (string) "",
+ APIVersion: (string) ""
+ },
+ ObjectMeta: (v1.ObjectMeta) {
+ Name: (string) (len=9) "deleted-1",
GenerateName: (string) "",
Test: TestConvertDataReadings/happy_path
| // resources every 1 minute, which will cause unnecessary load on the apiserver. | ||
| // We need to look back at the Git history and understand whether this was done | ||
| // for good reason or due to some misunderstanding. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these comments for context and as a reminder that this all needs to be improved. If I had more time, I might have had a go at refactoring all this code allow the cache to be optional; or perhaps having a smaller cache containing only the deleted resources.
But that would have taken more time to code and much more time to test; we'd need to do extensive testing to ensure that the deleted resources are still being reported to Venafi control plane.
|
@wallrj-cyberark I agree with the changes and the comments, have not manually tested this change however. |
Fixes: https://venafi.atlassian.net/browse/VC-46370
The CyberArk discovery and context service doesn't need to know about deleted resources but because of the cache in the DynamicGatherer, short-lived deleted resources were being uploaded if they had been deleted during the five minutes preceding the upload. To further confuse things, the deleted items in the cache do not have the standard
.metadata.deletionTimestamp, because of a bug in the cache code.In the interests of a quick expedient fix which can be added to a patch release, I have modified the snapshot conversion code to exclude all deleted resources.
And I have added explanatory comments and TODO comments to the datagatherer package suggesting some long-term fixes.
Testing
I added some examples in the comments below showing how the modified unit tests fail without the modified implementation and showing the captured request body before and after this change.
I also ran the Ark E2E tests to verify that the snapshot upload still works E2E. Here's the output: