Skip to content

Conversation

@wallrj-cyberark
Copy link
Member

@wallrj-cyberark wallrj-cyberark commented Oct 28, 2025

Fixes: https://venafi.atlassian.net/browse/VC-46370

The CyberArk discovery and context service doesn't need to know about deleted resources but because of the cache in the DynamicGatherer, short-lived deleted resources were being uploaded if they had been deleted during the five minutes preceding the upload. To further confuse things, the deleted items in the cache do not have the standard .metadata.deletionTimestamp, because of a bug in the cache code.

In the interests of a quick expedient fix which can be added to a patch release, I have modified the snapshot conversion code to exclude all deleted resources.

And I have added explanatory comments and TODO comments to the datagatherer package suggesting some long-term fixes.

Testing

I added some examples in the comments below showing how the modified unit tests fail without the modified implementation and showing the captured request body before and after this change.

I also ran the Ark E2E tests to verify that the snapshot upload still works E2E. Here's the output:

$ make ark-test-e2e
...
{
  "ts": 1761664198703.4653,
  "caller": "identity/identity.go:403",
  "msg": "successfully completed AdvanceAuthentication request to CyberArk Identity; login complete",
  "v": 0,
  "logger": "Run.gatherAndOutputData.postData",
  "username": "XXXX"
}
{"ts":1761664201442.9082,"caller":"agent/run.go:417","msg":"Data sent successfully","v":0,"logger":"Run.gatherAndOutputData.postData"}
process_cpu_seconds_total 0.35
process_max_fds 1.073741816e+09
process_network_receive_bytes_total 338371
process_network_transmit_bytes_total 254369
process_open_fds 13
process_resident_memory_bytes 4.8377856e+07
process_start_time_seconds 1.76166417164e+09
process_virtual_memory_bytes 1.300512768e+09
process_virtual_memory_max_bytes 1.8446744073709552e+19
/ko-app/ark agent -c /etc/disco-agent/config.yaml --machine-hub --logging-format=json --enable-metrics --enable-pprof

…ings to snapshot

- Skip resources marked deleted when extracting from DataReadings
- Append only non-deleted runtime.Objects to avoid nil entries
- Document that extractor functions exclude deleted resources
- Update tests and example input to assert deleted resources are ignored
- Document the motivation for the caching mechanism and its faults

Signed-off-by: Richard Wall <richard.wall@cyberark.com>
@wallrj-cyberark wallrj-cyberark force-pushed the cyberark-skip-deleted-resources branch from cc0799c to ca9ae2e Compare October 28, 2025 16:04
"name": "deleted-secret-1",
"namespace": "team-2"
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is used by the tests in cmd/agent_test.go.

t.Run("machinehub", func(t *testing.T) {
arktesting.SkipIfNoEnv(t)
runSubprocess(t, repoRoot, []string{
"--agent-config-file", filepath.Join(repoRoot, "examples/machinehub/config.yaml"),
"--input-path", filepath.Join(repoRoot, "examples/machinehub/input.json"),
"--machine-hub",
})
})

By capturing the HTTP request using mitmproxy we can see the deleted items are included in the snapshot before the changes to the implementation and absent after the implementation change:

mitmproxy
 make test-unit HTTPS_PROXY=localhost:8080

BEFORE:

tail -1 request.txt  | jq
{
  "agent_version": "development",
  "cluster_id": "0e069229-d83b-4075-a4c8-95838ff5c437",
  "cluster_name": "github-jetstack-secure-tests@cyberark.cloud.420375",
  "k8s_version": "v1.27.6",
  "secrets": [
    {
      "apiVersion": "v1",
      "kind": "Secret",
      "metadata": {
        "name": "app-1-secret-1",
        "namespace": "team-1"
      }
    },
    {
      "apiVersion": "v1",
      "kind": "Secret",
      "metadata": {
        "name": "deleted-secret-1",
        "namespace": "team-2"
      }
    }
  ],
  "serviceaccounts": [],
  "roles": [],
  "clusterroles": [],
  "rolebindings": [],
  "clusterrolebindings": [],
  "jobs": [],
  "cronjobs": [],
  "deployments": [],
  "statefulsets": [],
  "daemonsets": [],
  "pods": [
    {
      "apiVersion": "v1",
      "kind": "Pod",
      "metadata": {
        "name": "app-1-pod-1",
        "namespace": "team-1"
      }
    },
    {
      "apiVersion": "v1",
      "kind": "Pod",
      "metadata": {
        "name": "deleted-pod-1",
        "namespace": "team-2"
      }
    }
  ]
}

AFTER:

tail -1 request-after.txt  | jq
{
  "agent_version": "development",
  "cluster_id": "0e069229-d83b-4075-a4c8-95838ff5c437",
  "cluster_name": "github-jetstack-secure-tests@cyberark.cloud.420375",
  "k8s_version": "v1.27.6",
  "secrets": [
    {
      "apiVersion": "v1",
      "kind": "Secret",
      "metadata": {
        "name": "app-1-secret-1",
        "namespace": "team-1"
      }
    }
  ],
  "serviceaccounts": [],
  "roles": [],
  "clusterroles": [],
  "rolebindings": [],
  "clusterrolebindings": [],
  "jobs": [],
  "cronjobs": [],
  "deployments": [],
  "statefulsets": [],
  "daemonsets": [],
  "pods": [
    {
      "apiVersion": "v1",
      "kind": "Pod",
      "metadata": {
        "name": "app-1-pod-1",
        "namespace": "team-1"
      }
    }
  ]
}

},
},
},
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of how this test fails before the implementation change.

=== FAIL: pkg/client TestExtractResourceListFromReading/happy_path (0.00s)
    client_cyberark_convertdatareadings_test.go:251:
                Error Trace:    /home/richard/projects/jetstack/jetstack-secure/pkg/client/client_cyberark_convertdatareadings_test.go:251
                Error:          "[0xc00051a028 0xc00051a030 0xc00051a038]" should have 2 item(s), but has 3
                Test:           TestExtractResourceListFromReading/happy_path

Namespace: "team-1",
},
},
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of how this test fails before the implementation change:

=== FAIL: pkg/client TestConvertDataReadings/happy_path (0.00s)
    client_cyberark_convertdatareadings_test.go:372:
                Error Trace:    /home/richard/projects/jetstack/jetstack-secure/pkg/client/client_cyberark_convertdatareadings_test.go:372
                Error:          Not equal:
                                expected: dataupload.Snapshot{AgentVersion:"", ClusterID:"success-cluster-id", ClusterName:"", ClusterDescription:"", K8SVersion:"v1.21.0", Secrets:[]runtime.Object{(*v1.Secret)(0xc000050780)}, ServiceAccounts:[]runtime.Object(nil), Roles:[]runtime.Object(nil), ClusterRoles:[]runtime.Object(nil), RoleBindings:[]runtime.Object(nil), ClusterRoleBindings:[]runtime.Object(nil), Jobs:[]runtime.Object(nil), CronJobs:[]runtime.Object(nil), Deployments:[]runtime.Object(nil), Statefulsets:[]runtime.Object(nil), Daemonsets:[]runtime.Object(nil), Pods:[]runtime.Object(nil)}
                                actual  : dataupload.Snapshot{AgentVersion:"", ClusterID:"success-cluster-id", ClusterName:"", ClusterDescription:"", K8SVersion:"v1.21.0", Secrets:[]runtime.Object{(*v1.Secret)(0xc000050500), (*v1.Secret)(0xc000050640)}, ServiceAccounts:[]runtime.Object(nil), Roles:[]runtime.Object(nil), ClusterRoles:[]runtime.Object(nil), RoleBindings:[]runtime.Object(nil), ClusterRoleBindings:[]runtime.Object(nil), Jobs:[]runtime.Object(nil), CronJobs:[]runtime.Object(nil), Deployments:[]runtime.Object(nil), Statefulsets:[]runtime.Object(nil), Daemonsets:[]runtime.Object(nil), Pods:[]runtime.Object(nil)}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -6,3 +6,3 @@
                                  K8SVersion: (string) (len=7) "v1.21.0",
                                - Secrets: ([]runtime.Object) (len=1) {
                                + Secrets: ([]runtime.Object) (len=2) {
                                   (*v1.Secret)({
                                @@ -14,2 +14,35 @@
                                     Name: (string) (len=5) "app-1",
                                +    GenerateName: (string) "",
                                +    Namespace: (string) (len=6) "team-1",
                                +    SelfLink: (string) "",
                                +    UID: (types.UID) "",
                                +    ResourceVersion: (string) "",
                                +    Generation: (int64) 0,
                                +    CreationTimestamp: (v1.Time) {
                                +     Time: (time.Time) {
                                +      wall: (uint64) 0,
                                +      ext: (int64) 0,
                                +      loc: (*time.Location)(<nil>)
                                +     }
                                +    },
                                +    DeletionTimestamp: (*v1.Time)(<nil>),
                                +    DeletionGracePeriodSeconds: (*int64)(<nil>),
                                +    Labels: (map[string]string) <nil>,
                                +    Annotations: (map[string]string) <nil>,
                                +    OwnerReferences: ([]v1.OwnerReference) <nil>,
                                +    Finalizers: ([]string) <nil>,
                                +    ManagedFields: ([]v1.ManagedFieldsEntry) <nil>
                                +   },
                                +   Immutable: (*bool)(<nil>),
                                +   Data: (map[string][]uint8) <nil>,
                                +   StringData: (map[string]string) <nil>,
                                +   Type: (v1.SecretType) ""
                                +  }),
                                +  (*v1.Secret)({
                                +   TypeMeta: (v1.TypeMeta) {
                                +    Kind: (string) "",
                                +    APIVersion: (string) ""
                                +   },
                                +   ObjectMeta: (v1.ObjectMeta) {
                                +    Name: (string) (len=9) "deleted-1",
                                     GenerateName: (string) "",
                Test:           TestConvertDataReadings/happy_path

// resources every 1 minute, which will cause unnecessary load on the apiserver.
// We need to look back at the Git history and understand whether this was done
// for good reason or due to some misunderstanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these comments for context and as a reminder that this all needs to be improved. If I had more time, I might have had a go at refactoring all this code allow the cache to be optional; or perhaps having a smaller cache containing only the deleted resources.
But that would have taken more time to code and much more time to test; we'd need to do extensive testing to ensure that the deleted resources are still being reported to Venafi control plane.

@inteon
Copy link
Contributor

inteon commented Oct 28, 2025

@wallrj-cyberark I agree with the changes and the comments, have not manually tested this change however.

@wallrj-cyberark wallrj-cyberark merged commit 558fde0 into master Oct 28, 2025
3 checks passed
@wallrj-cyberark wallrj-cyberark deleted the cyberark-skip-deleted-resources branch October 28, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants