merge: upstream/main + recover wildcards/signing on projection-v1 by entlein · Pull Request #43 · k8sstormcenter/node-agent

entlein · 2026-05-13T15:26:01Z

Summary

Brings the node-agent fork's `main` up to upstream parity AND recovers the fork's wildcard + tamper-detection behaviour on top of upstream's projection-v1 reshape.

Supersedes PR #41 (wildcards) and PR #42 (matcher cache). The matcher cache is intentionally NOT in this branch — upstream's projection-v1 (kubescape#788/kubescape#799) already pre-compiles + caches per-profile, so a per-call matcher cache would duplicate effort.

Upstream commits merged (8)

SHA	Subject	Notes
kubescape#803	improve logging for missing profileDataRequired	clean
kubescape#800	exec path symmetric resolver	clean
kubescape#797	learning-period label on TS CPs	clean
kubescape#798	syft v1.32.0-ks.2 perf	clean
kubescape#791	client CA from kubelet config	clean
kubescape#772	services from API	clean
kubescape#799	Rule-Aware Profile Projection	major reshape (53 files, +3131/-440)
kubescape#788	Replace AP+NN cache with CP	major reshape (51 files, +4643/-1997)

Wildcard + signing recovery (fork-only, layered ON TOP of projection)

Network wildcards (spec §5.7, §5.8)

`projectField` accepts a per-surface `isDynamic` classifier (was `isPathSurface` bool). Path surfaces use `containsDynamicSegment` (⋯); network surfaces use `isNetworkIPWildcard` / `isNetworkDNSWildcard` so CIDRs, `*` sentinel, and DNS wildcards route to `Patterns` rather than `Values`.
`extractEgressAddresses` / `extractIngressAddresses` now pull `n.IPAddresses[]` (storage v0.0.2 list form) alongside the deprecated singular `n.IPAddress`.
CEL helpers (`nn.was_address_in_`, `nn.is_domain_in_`) consult `Values` + `All` + `Patterns` via `networkmatch.MatchIP` / `MatchDNS`.
`matchIPField` canonicalises observed IPs (`net.ParseIP`) so IPv6 expanded forms and `::ffff:` IPv4-mapped addresses still hit the Values fast path.
`matchDNSField` normalises trailing dots on observed names against Values.

Exec-args matching (restored on projection)

`ProjectedContainerProfile.ExecsByPath map[string][]string` carries the per-Path Args slice. Upstream's projection-v1 explicitly dropped argv matching as "future work"; this re-adds it cleanly.
`wasExecutedWithArgs` runs `dynamicpathdetector.CompareExecArgs` against the matched profile entry. Back-compat: a path with no `ExecsByPath` entry matches with no argv constraint.

Tamper detection (signing)

`ContainerProfileCacheImpl.tamperAlertExporter` + `tamperEmitted` fields re-added on the new upstream struct. The R1016 "Signed profile tampered" wiring keeps working.

Mock parity

`RuleObjectCacheMock.GetProjectedContainerProfile` mirrors the production projection: routes wildcards to Patterns, populates `ExecsByPath`.
`ensureProjectedAllInit` fixed (no longer mis-sets `All=true`, which is the match-any sentinel, not a comprehensiveness hint).

Known port/protocol gap (out-of-scope here, in the spec)

`was_address_port_protocol_in_*` still degrades to address-only matching on the projection. Restoring port/protocol requires an `AddressPortsByAddr` projection field that upstream noted as future work. Spec §4.7 describes per-neighbor port lists; the unit tests document the degradation (`wildcard_test.go:340-380`). No production rule in `tests/chart/templates/node-agent/default-rules.yaml` uses this helper, so component tests stay green.

Test plan

`go build ./...` clean
46/48 packages green (`pkg/containerwatcher/v2/tracers` and `pkg/validator` are pre-existing eBPF kernel-permission failures, reproduce on main)
Wildcard tests: leading-* RFC 4592, mid-⋯, trailing-*, CIDR membership, IPv6 canonicalisation, IPv4-mapped, trailing-dot normalisation
Exec-args restored: `TestExecWithArgsInProfile` + `TestExecWithArgsWildcardInProfile` both pass
Fixture-walk: 20 user-facing YAML examples parse strict against v1beta1 + behave per the headers
Tamper detection: `pkg/objectcache/containerprofilecache` package tests green

Storage dep

`go.mod` pinned to `k8sstormcenter/storage@b23d85f0` = `merge/upstream-profile-rearch` on the storage fork (companion PR #31 there).

* feat: foundation for ContainerProfileCache unification (steps 1, 2, 5-early) Additive-only scaffolding for the upcoming migration from the two workload-keyed caches (applicationprofilecache + networkneighborhoodcache) to a single container-keyed ContainerProfileCache. No consumers are rewired yet; all new code is unused. - Storage client: GetContainerProfile(namespace, name) on ProfileClient interface + *Storage impl + mock. - ContainerProfileCache interface + stub impl (methods return zero values; filled in by step 3/4). - Prometheus metrics: nodeagent_user_profile_legacy_loads_total{kind,completeness} deprecation counter + reconciler SLO metrics (entries gauge, hit/miss counter, tick duration histogram, eviction counter) registered up front so later steps emit cleanly. Plan artifacts in .omc/plans/; approved by ralplan Planner/Architect/Critic consensus (v2, iteration 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: ContainerProfileCacheImpl + projection + shared-pointer fast-path (steps 3, 3.5, 4) - CachedContainerProfile entry with Shared/RV/UserAP/UserNNRV fields - Option A+ fast-path: shared storage pointer when no user overlay - projection.go ports mergeContainers/mergeNetworkNeighbors from legacy caches - partial-profile detection with dedup'd WARN log + completeness metric label - Event-path delete with WithLock+ReleaseLock (Critic #2 lock-gap fix) - Unit tests T4 (projection) + T6 (callstack parity) + fast-path identity Step 5 (reconciler) and legacy deletion land in follow-ups. Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: ContainerProfileCache reconciler with evict + refresh (step 5) - tickLoop drives evict + refresh on one goroutine, refresh gated by atomic - reconcileOnce evicts entries whose pod is gone or container stopped - refreshAllEntries snapshots IDs then refreshes outside Range to avoid a SafeMap RLock/WLock deadlock (rebuildEntry calls Set) - isContainerRunning(pod, entry, id): containerID primary, (Name, PodUID) fallback for pre-running init containers with empty ContainerID - ctx.Err() honored inside Range callbacks for graceful shutdown - T8 end-to-end test: user-AP mutation -> cached projection reflects change Plan: .omc/plans/containerprofile-cache-unification-consensus.md Consensus deltas applied: #1 (isContainerRunning signature), #3 (ctx.Err), #4 (extend fast-skip to overlay RVs), #5 (T8 test), #7 (RPC-cost comment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: profilehelper CP->legacy-shape shims + ContainerProfileCache aggregator wiring (step 6a) Adds the ContainerProfileCache reader to the ObjectCache aggregator interface so profilehelper can read CP and synthesize the legacy ApplicationProfileContainer / NetworkNeighborhoodContainer shapes for callers that haven't migrated yet. - pkg/objectcache/objectcache_interface.go: add ContainerProfileCache() to aggregator interface + mock (both AP/NN stay for 6a-6c transit) - pkg/objectcache/v1/objectcache.go: add cp field, 5-arg NewObjectCache, ContainerProfileCache() accessor - pkg/objectcache/v1/mock.go: RuleObjectCacheMock implements CP surface + Get/SetContainerProfile test helpers, Start stub - pkg/rulemanager/profilehelper/profilehelper.go: - GetContainerProfile(objectCache, id) returns (*CP, syncChecksum, error) — the forward API - GetContainerApplicationProfile + GetContainerNetworkNeighborhood rewritten as ~30-LOC CP->legacy-shape shims (consensus delta #2). Marked deprecated; step 6c deletes them after CEL callers migrate. - cmd/main.go: construct ContainerProfileCache alongside APC+NNC, pass to NewObjectCache; mock-path uses ContainerProfileCacheMock - test call sites updated for 5-arg NewObjectCache Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: migrate 20 CEL call sites to GetContainerProfile (step 6b) - applicationprofile/{capability,exec,http,open,syscall}.go: read fields directly off cp.Spec instead of the per-container AP shape - networkneighborhood/network.go: read Ingress/Egress/LabelSelector off cp.Spec directly - pkg/objectcache/v1/mock.go: extend RuleObjectCacheMock so SetApplicationProfile / SetNetworkNeighborhood also project into the unified ContainerProfile, and GetContainerProfile honours the shared container-ID registry (preserves "invalid container ID -> no profile" semantics for existing tests) - profilehelper CP->legacy shims remain in place; step 6c removes them Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: delete profilehelper shims + migrate rule_manager + creator (step 6c) - pkg/rulemanager/profilehelper/profilehelper.go: delete GetContainerApplicationProfile, GetContainerNetworkNeighborhood, GetApplicationProfile, GetNetworkNeighborhood, GetContainerFromApplicationProfile, GetContainerFromNetworkNeighborhood — CP-direct API is the only surface now - pkg/rulemanager/rule_manager.go: - :202, :399 call profilehelper.GetContainerProfile instead of the shim - HasFinalApplicationProfile reads cp via ContainerProfileCache().GetContainerProfile; method name preserved (external API on RuleManagerInterface per plan v2 §2.4) - pkg/rulemanager/rulepolicy.go: Validate takes *v1beta1.ContainerProfile and reads cp.Spec.PolicyByRuleId - pkg/rulemanager/ruleadapters/creator.go: both AP + NN branches use ContainerProfileCache().GetContainerProfileState (unified state source) Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: ObjectCache aggregator CP-only + collapse 2 callbacks to 1 (step 6d) - pkg/objectcache/objectcache_interface.go: drop ApplicationProfileCache() and NetworkNeighborhoodCache() methods — the aggregator is now {K8s, ContainerProfile, Dns} - pkg/objectcache/v1/objectcache.go: 3-arg NewObjectCache(k, cp, dc) - pkg/containerwatcher/v2/container_watcher_collection.go:63-64: two ContainerCallback subscriptions (APC + NNC) collapse to one (CPC) - cmd/main.go: both branches (runtime-detection + mock) construct only ContainerProfileCache + Dns; legacy APC/NNC wiring removed with startup log: "ContainerProfileCache active; legacy AP/NN caches removed" - test call sites updated for 3-arg NewObjectCache Legacy packages still physically present (imports retained where still referenced, e.g. callstackcache); step 8 deletes them entirely. Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: delete legacy AP/NN cache packages + move callstackcache (step 8) - git rm -r pkg/objectcache/applicationprofilecache/ (766 LOC) - git rm -r pkg/objectcache/networkneighborhoodcache/ (758 LOC) - git rm pkg/objectcache/applicationprofilecache_interface.go - git rm pkg/objectcache/networkneighborhoodcache_interface.go - mv pkg/objectcache/applicationprofilecache/callstackcache/ -> pkg/objectcache/callstackcache/ (domain-agnostic, shared) - Update 4 importers: containerprofilecache_interface.go, v1/mock.go, containerprofilecache.go, reconciler.go - RuleObjectCacheMock drops ApplicationProfileCache()/NetworkNeighborhoodCache() accessor methods; SetApplicationProfile/SetNetworkNeighborhood remain as test helpers that project into the unified CP - projection.go comments kept as historical source pointers — git history preserves the originals Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: add T2 init-eviction, T5 packages-deleted, T7 lock-stress (step 9 partial) - tests/containerprofilecache/packages_deleted_test.go: go/packages dep-graph assertion that legacy AP/NN paths are absent - tests/containerprofilecache/lock_stress_test.go: 100-goroutine interleaved seed/read for same container IDs, 5s budget, race-safe - tests/containerprofilecache/init_eviction_test.go: T2a (event-path evict) + T2b (reconciler-path evict for missed RemoveContainer) - tests/containerprofilecache/helpers_test.go: shared test builders - pkg/objectcache/containerprofilecache: export ReconcileOnce and SeedEntryForTest as out-of-package test hooks - Makefile: check-legacy-packages target T1 (golden-alert parity) and T3 (memory benchmark) are release-checklist items per plan v2 §2.7 — the pre-migration baselines those tests require can no longer be captured from this branch. Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Phase 4 review P1 findings 1. Drop ReleaseLock on delete paths (containerprofilecache.go deleteContainer, reconciler.go reconcileOnce). Security review flagged a race where the deleted mutex could be orphaned while a concurrent GetLock creates a new one, breaking mutual exclusion for the same container ID. Trade-off: bounded memory growth of stale lock entries, proportional to container churn — acceptable for a node-agent lifetime. 2. Extract emitOverlayMetrics helper (metrics.go) to de-duplicate the ~20-line overlay metric/deprecation-warn block between buildEntry (addContainer path) and rebuildEntry (refresh path). Keeps the two in lockstep — code review flagged silent drift risk. Not addressed in this commit (plan-accepted tradeoffs, follow-up work): - Shared-pointer read-only invariant is convention-enforced, not type- enforced (plan v2 §2.3 step 7, ADR consequences). Retaining as-is; downstream consumers must not mutate. - Storage RPC context propagation (requires storage.ProfileClient interface change, out of scope for this migration). Plan: .omc/plans/containerprofile-cache-unification-consensus.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: retry pending ContainerProfile GETs when CP appears after container-start Component tests on PR kubescape#788 regressed with "All alerts: []" and 54+ "container X not found in container-profile cache" log entries. Root cause: addContainer did a one-shot GetContainerProfile at EventTypeAddContainer time and bailed on 404. But the CP is created asynchronously by containerprofilemanager ~60s AFTER container-start, so the one-shot GET almost always missed; the cache entry was never created; rule evaluation short-circuited as "no profile". Legacy caches hid this via a periodic ListProfiles scan that picked up late-arriving profiles on the next tick. The point-lookup model dropped that scan. This commit adds an equivalent: a pending-container retry path in the reconciler. Changes: - CachedContainerProfile unchanged; new pendingContainer struct captures (container, sharedData, cpName) needed to retry the initial GET. - ContainerProfileCacheImpl.pending SafeMap records containerIDs waiting for their CP to land in storage. - addContainer extracts the populate/GET into tryPopulateEntry. On miss (err or nil CP) it records a pending entry; the per-container goroutine exits. No more waiting 10 min inside addContainerWithTimeout. - reconciler.retryPendingEntries iterates pending under per-container locks, re-issues the GET, and promotes via tryPopulateEntry on success. - reconcileOnce gains a pending GC pass: containers whose pod is gone or whose status is not Running get dropped from pending so we don't retry forever on terminated containers. - deleteContainer also clears from pending on EventTypeRemoveContainer. - metrics: cache_entries gauge gains a "pending" kind; reconciler eviction counter gets a "pending_pod_stopped" reason. Tests: - TestRetryPendingEntries_CPCreatedAfterAdd: 404 on add -> pending; CP arrives in storage -> one tick promotes; exactly 2 GetCP calls. - TestRetryPendingEntries_PodGoneIsGCed: pending entry dropped when its pod is no longer present in k8s cache. Full findings and resume doc at .omc/plans/containerprofile-cache-component-test-findings.md Follow-up plan updated at .omc/plans/containerprofile-cache-followups.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: cache correctness — right CP slug, partial-on-restart, overlay refs, resurrection guard PR kubescape#788 component tests continued failing after the pending-retry fix. Deep investigation uncovered a fundamental slug misuse and three reviewer- reported correctness gaps. All fixed here. ### Primary bug: wrong slug function plan v2 §2.3 asserted that GetOneTimeSlug(false) was deterministic. It is NOT — implementation at k8s-interface v0.0.206: func (id *InstanceID) GetOneTimeSlug(noContainer bool) (string, error) { u := uuid.New() hexSuffix := hex.EncodeToString(u[:]) ... } So containerprofilemanager.saveContainerProfile writes a *time-series* CP per tick with a fresh UUID suffix, and the storage-side ContainerProfileProcessor.consolidateKeyTimeSeries writes the consolidated profile at the STABLE slug (GetSlug(false), no UUID). The cache was querying for CPs at GetOneTimeSlug(false), so every GET 404'd forever — even with the pending-retry in place. 13 component tests failed with "All alerts: []" and 38+ "container X not found in container-profile cache" log entries. Switched addContainer to GetSlug(false). The refresh path inherits the corrected name via entry.CPName. ### Reviewer #1: resurrection during refresh refreshAllEntries snapshots entries without a lock. Between snapshot and per-entry lock acquisition, deleteContainer or reconcile-evict may have removed the entry. Previously, rebuildEntry's c.entries.Set(id, newEntry) would resurrect the dead container. Added a load-under-lock guard at the top of refreshOneEntry. ### Reviewer #2: overlay handling regressions (two parts) (a) tryPopulateEntry returned "pending" on base-CP 404 BEFORE trying user-AP/NN. Containers with only a user-defined profile (no base CP yet) got no entry. Restructured: fetch base CP and user-AP/NN independently; populate if ANY source is available; synthesize an empty base CP when only the overlay exists so projection has something to merge into. (b) UserAPRef / UserNNRef were only recorded on successful fetch. A transient 404 on add would permanently drop the overlay intent — the refresh path had nothing to re-fetch. Now, when the label is set, the refs are always recorded, using the label's name and the container's namespace. Refresh retries the fetch each tick. ### Reviewer #3: partial profiles reused across container restart tryPopulateEntry blindly used whatever CP existed at the stable slug, including Partial completions from the previous container incarnation. Legacy caches explicitly deleted Partial profiles on non-PreRunning restart so rule evaluation fell through to "no profile" until Full arrived. Now: if CP.completion == Partial && !sharedData.PreRunningContainer, we treat the CP as absent → stay pending → retry each tick. When the CP becomes Full (or the container stops), the pending state resolves. The inverse is preserved: PreRunningContainer (agent-restart scenario) accepts the Partial CP as-is so Test_19's "alert on partial profile" semantics still work. ### Tests Five new unit tests, all race-clean: - TestPartialCP_NonPreRunning_StaysPending - TestPartialCP_PreRunning_Accepted - TestOverlayLabel_TransientFetchFailure_RefsRetained - TestRefreshDoesNotResurrectDeletedEntry - TestUserDefinedProfileOnly_NoBaseCP Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: read workload-level AP/NN as primary data source The storage server's consolidated ContainerProfile is not exposed via the public k8s API — ContainerProfiles().Get(stableName) returns 404 even after consolidation runs. Only time-series CPs (named <stable>-<UUID>) and the server-aggregated ApplicationProfile / NetworkNeighborhood CRs at the workload-name are queryable. The component tests' WaitForApplicationProfileCompletion waits for the workload-level AP/NN completion — that's what actually exists. The legacy caches read these directly; we do the same now while the server-side consolidated-CP plumbing is completed. Changes: - addContainer computes both cpName (per-container, forward-compat) and workloadName (per-workload, where AP/NN live) via GetSlug(false) and GetSlug(true) respectively. - tryPopulateEntry fetches consolidated CP (kept for forward-compat), workload AP, and workload NN. Treats the workload AP/NN as the primary data source when the consolidated CP isn't available. - projection pre-merges workloadAP + workloadNN onto the base (synthesized when CP is 404), then buildEntry applies user-overlay AP/NN on top. - Partial-on-restart gate extended to cover workload AP/NN too — non PreRunning containers ignore partial workload profiles until they become Full, mirroring legacy deletion-on-restart semantics. - pendingContainer gains workloadName so retries re-fetch the right CRs. - fakeProfileClient gains overlayOnly field so tests can scope AP/NN returns to the overlay name; existing TestOverlayPath_DeepCopies updated accordingly. Forward-compat: once storage publishes a queryable consolidated CP at cpName, its fetch becomes primary and the workload AP/NN path becomes a fallback. No API changes are required to make that transition — just drop the workload-level fetches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * debug: add tick-loop start log + change-detection log in reconciler * fix: remove overly-aggressive pending GC that dropped entries before retry CI run 24781030436 (commit ce32919) proved the reconciler IS ticking with retryPendingEntries running, but the pending-GC pass in reconcileOnce was dropping every pending entry on the first tick (pending_before=4 → pending_after=0 at the FIRST tick, before retryPendingEntries could run). Root cause: the GC pass asked k8sObjectCache.GetPod(ns, pod) and also checked isContainerRunning. On a busy node, the k8s pod cache and ContainerStatuses lag the containerwatcher Add event by tens of seconds. So "pod not found" or "container not yet Running" routinely returned true for a container that had just been registered, causing GC to remove the pending entry immediately. Retries then ran against an empty pending map → no promotions → alerts fired without profile → test failure. Change: remove the pending GC entirely. Cleanup for terminated containers flows through deleteContainer (EventTypeRemoveContainer) which clears both entries and pending under the per-container lock. Memory growth is bounded by the node's container churn (containers that never got a profile during their lifetime). Test updated: TestRetryPendingEntries_PodGoneIsGCed replaced by TestPendingEntriesAreNotGCedBeforeRetry which asserts the new semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: merge user-managed AP/NN and refresh workload-level sources Two component-test regressions in PR kubescape#788: Fix A (Test_12 / Test_13): the cache now reads the user-managed ApplicationProfile and NetworkNeighborhood published at "ug-<workloadName>" and projects them onto the base profile as a dedicated ladder pass. Legacy caches did this via the `kubescape.io/managed-by: User` annotation in handleUserManagedProfile; we read them directly by their well-known name. Fix B (Test_17 / Test_19): the reconciler refresh path re-fetches the workload-level AP/NN (and user-managed / label-referenced overlays) on every tick, not just the consolidated CP. This propagates the Status= "ready" -> "completed" transition into the cached ProfileState, which flips fail_on_profile from false to true at rule-eval time. CachedContainerProfile gained WorkloadName plus WorkloadAPRV / WorkloadNNRV / UserManagedAPRV / UserManagedNNRV fields so the refresh can fast-skip when every source's RV matches what's cached. refreshOneEntry's rebuild now runs the same projection ladder as tryPopulateEntry: base CP (or synthesized) → workload AP+NN → user-managed (ug-) AP+NN → label-referenced user AP+NN. Also: - Tick-loop log only fires when entries OR pending count actually moved (previously fired whenever pending_before > 0, producing per-tick noise while a stuck-pending entry waited for profile data). - fakeProfileClient in tests returns userManagedAP/userManagedNN when the requested name starts with "ug-". - New tests: TestWorkloadAPMerged_AndRefreshUpdatesStatus (Fix B happy-path) and TestUserManagedProfileMerged (Fix A happy-path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reconcileOnce no longer evicts on pod-cache lag, only on Terminated CI run 24783250693 (commit 32a76c0) showed reconcileOnce evicting live entries every tick: "entries_before:10, entries_after:0" within 5 seconds of the agent starting. Same class of bug as the pending-GC fix (c45803f): the k8s pod cache lags ContainerCallback Add events by tens of seconds, and evicting on "GetPod returns nil OR !isContainerRunning" churned every entry before any rules could evaluate. Change reconcileOnce eviction gate: - If pod is missing from k8s cache: DO NOT evict. Cache lag is transient; deleteContainer handles real-world cleanup via EventTypeRemoveContainer. - If pod present and container has clearly Terminated: evict (preserves init-container eviction for Test_02 and T2 acceptance). - If pod present and container in Waiting state: retain (new container creation, init-container pre-run both legitimately pass through Waiting). New helper isContainerTerminated mirrors isContainerRunning but gates on State.Terminated only; "not found in any status" treated as terminated. Tests: - TestReconcilerEvictsWhenPodMissing → TestReconcilerKeepsEntryWhenPodMissing - New TestReconcilerEvictsTerminatedContainer - New TestReconcilerKeepsWaitingContainer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: drop workload-level AP/NN fetch; CP-direct reading is authoritative The workload-level AP/NN fetch added in d27be01 was a workaround for the eviction/GC bugs (fixed in c45803f and d9ae0ac), not an architectural need. The consolidated ContainerProfile IS queryable at the GetSlug(false) name once storage aggregation runs; the cache simply needs to wait on the pending-retry path. This reverts the workload-AP/NN read while keeping: - consolidated CP as the single base-profile source - user-managed AP/NN at "ug-<workloadName>" (merged on top) — still needed because user-managed profiles are authored independently and are not consolidated into the CP server-side - user-defined overlay via pod UserDefinedProfileMetadataKey label - eviction fix (d9ae0ac), GC fix (c45803f), resurrection guard Removed: - workload-AP/NN fetch in tryPopulateEntry and refreshOneEntry - WorkloadAPRV / WorkloadNNRV fields on CachedContainerProfile and the corresponding rebuildEntryFromSources ladder pass - Partial-on-restart gate for workload AP/NN (only applies to CP now) - Synth-CP annotation fallback chain (simplified to Completed/Full) Tests: - TestWorkloadAPMerged_AndRefreshUpdatesStatus → TestRefreshUpdatesCPStatus (CP now the source; RV transition propagates Status) - TestUserManagedProfileMerged rewired to use a real base CP + ug- overlay instead of workloadAP + ug- overlay This matches the migration plan's original intent: CP-direct, AP/NN only as user overlays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: synthetic entry CPName override, PodUID backfill, phase-labeled reconciler histogram Three review findings from the post-green audit. ### 1 (High) — synthetic entry stored the wrong CPName When tryPopulateEntry synthesized a CP (consolidated CP still 404), the synthetic name was workloadName or overlayName, and buildEntry persisted entry.CPName = cp.Name (i.e. the synthetic name). refreshOneEntry then queried the synthetic name instead of the real GetSlug(false) name; with the stored RV also empty, the fast-skip's "absent matches empty" branch kept the synthetic entry forever and the real consolidated CP could never replace it. Fix: after buildEntry, override entry.CPName = cpName (the real GetSlug(false) result passed into tryPopulateEntry). ### 2 (Medium) — PodUID never backfilled buildEntry only sets PodUID when the pod is already in k8sObjectCache at add time. On busy nodes the pod cache lags, so addContainer often runs before the pod lands and PodUID stays "". isContainerTerminated's empty-ContainerID fallback matches against (ContainerName, PodUID); when PodUID == "" and the status also has empty UID, the loop falls through and returns true (treat as terminated) — evicting a still-live init container. rebuildEntryFromSources copied prev.PodUID unchanged, so the error never healed. Fix: in rebuildEntryFromSources, if prev.PodUID is empty AND the pod is now in the k8s cache, use the fresh UID. ### 3 (Low) — reconciler duration histogram mixed two phases tickLoop (evict) and refreshAllEntries (refresh) both emitted ReportContainerProfileReconcilerDuration into the same plain Histogram, so nodeagent_containerprofile_reconciler_duration_seconds was a blend of two very different workloads. Plan v2 §2.9 had specified a HistogramVec with a "phase" label from the start. Fix: MetricsManager.ReportContainerProfileReconcilerDuration(phase, d). Prometheus implementation becomes a HistogramVec with phase label. tickLoop emits phase="evict", refreshAllEntries emits phase="refresh". MetricsMock/MetricsNoop signatures updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address all CodeRabbit review comments on PR kubescape#788 - ContainerProfileCacheMock.GetContainerProfileState returns synthetic error state instead of nil, matching the real impl's contract - Remove IgnoreContainer check on EventTypeRemoveContainer to prevent stale entries when pod labels change after Add - Deep-copy userAP/userNN in mergeApplicationProfile and mergeNetworkNeighborhood to eliminate aliasing of nested slices (Execs[i].Args, Opens[i].Flags, MatchExpressions[i].Values, etc.) into the cached ContainerProfile - Fix Shared=true bug: buildEntry now takes userManagedApplied bool; fast-path only sets Shared=true when no overlay was applied at all, matching rebuildEntryFromSources logic in reconciler.go - isContainerTerminated returns false when all status slices are empty (kubelet lag guard for brand-new pods) - Fix misplaced doc comment above GetContainerProfile in storage layer - Remove unused (*stubStorage).setCP test helper - Lock stress test evict path now uses ContainerCallback(Remove) to exercise deleteContainer and per-container locking - RuleObjectCacheMock stores per-container profiles in cpByContainerName; GetContainerProfile resolves via InstanceID.GetContainerName(); GetContainerProfileState returns synthetic error state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: thread context.Context through ProfileClient and add per-call RPC budget All five ProfileClient methods now accept ctx as their first argument so callers can enforce cancellation and deadline propagation. Each storage RPC in the reconciler is wrapped via refreshRPC(ctx, ...) which applies a configurable per-call timeout (config.StorageRPCBudget, default 5 s) on top of the parent context, preventing a slow API server from stalling an entire reconciler burst. Tests cover the fast-skip, rebuild, and context-cancellation mid-RPC paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: shared-pointer race-fuzz test + WarmContainerLocksForTest helper Add TestSharedPointerReadersDoNotCorruptCache: 50 concurrent readers traverse the returned *ContainerProfile slices while a writer goroutine alternately calls RefreshAllEntriesForTest + SeedEntryForTest to keep entry rebuilds active. Runs for 500ms under -race, proving the shared- pointer fast-path never produces a concurrent read/write pair. Also add TestSharedPointerFastPathPreservesPointerIdentity: after a refresh against a storage object with a newer RV, the new entry's Profile pointer IS the storage object (Shared=true, no DeepCopy), which keeps the T3 memory budget intact. Fix the pre-existing goradd/maps SafeMap initialisation race in TestLockStressAddEvictInterleaved by pre-warming containerLocks via the new WarmContainerLocksForTest helper (the previous pre-warm via SeedEntryForTest only covered the entries SafeMap, not containerLocks). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: document SetApplicationProfile / SetNetworkNeighborhood field partition in mock Add a block comment above RuleObjectCacheMock spelling out the non-overlapping cp.Spec field partition between the two setters and the first-container-wins rule for r.cp. Without this, future callers risk aliasing NN fields into an AP-only profile or vice-versa. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: T8 integration mirror, mock setter contract doc, SeedEntryWithOverlayForTest Add SeedEntryWithOverlayForTest helper so out-of-package integration tests can set UserAPRef / UserNNRef (which use the internal namespacedName type) without requiring the type to be exported. Mirror TestT8_EndToEndRefreshUpdatesProjection at tests/containerprofilecache/ using only the public + test-helper API: seeds an entry with a stale UserAPRV, mutates storage to apV2 (RV=51), asserts RefreshAllEntriesForTest rebuilds the projection with the new execs and drops the stale ones. Add top-of-file block comment to RuleObjectCacheMock documenting the non- overlapping AP-fields / NN-fields partition between SetApplicationProfile and SetNetworkNeighborhood and the first-container-wins rule for r.cp. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address Phase 4 code-review findings - reconciler.go: simplify dead-code cpErr/rpcErr guard (refreshRPC returns exactly cpErr; the rpcErr != nil && cpErr == nil branch could never fire) - reconciler_test.go: make blockingProfileClient.blocked a buffered chan(1) with a blocking send so the signal is never silently dropped; bump rpcBudget to 100ms and timeout to 2s to reduce flakiness on loaded CI - containerprofilecache.go: extract defaultStorageRPCBudget const alongside defaultReconcileInterval for discoverability - shared_pointer_race_test.go: fix gofmt const-block alignment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: preserve cached entry when overlay AP/NN fetch fails transiently Before this fix, a refreshRPC timeout on any overlay GET (user-managed ug-<workload> AP/NN or user-defined label-referenced AP/NN) left the overlay variable nil with the error silently discarded. The RV comparison then saw rvOf(nil)="" != cached RV (e.g. "50"), treated it as a removal, and rebuilt the entry without the overlay — temporarily stripping user-managed/user-defined profile data from the cache and altering alerting until the next successful tick. Fix: capture each overlay's fetch error and, when it is non-nil and the entry already has a non-empty cached RV for that overlay, return early and keep the existing entry unchanged. Legitimate deletions (nil with err==nil) still propagate correctly. Mirrors the existing CP error- preservation logic at refreshOneEntry:272-288. Add TestRefreshPreservesEntryOnTransientOverlayError covering all four overlay fetch paths (user-managed AP, user-managed NN, user-defined AP, user-defined NN) via a new overlayErrorClient stub. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address CodeRabbit review issues on PR kubescape#788 - Rename 5 CP cache metrics from nodeagent_* to node_agent_* to match the existing metric namespace convention used across node-agent. - Route all 5 storage GETs in tryPopulateEntry through refreshRPC so they respect the per-call SLO (default 5s); prevents a hung GET from blocking the entire reconciler tick loop when called from retryPendingEntries. - Add WarmPendingForTest helper to pre-initialise the pending SafeMap before concurrent test phases, preventing the goradd/maps nil-check-before-lock initialisation race. - Pre-warm pending SafeMap in TestLockStressAddEvictInterleaved and poll for async deleteContainer goroutines to drain before asserting goroutine count. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: distinct RNG seed per stress-test worker Pass worker index into each goroutine closure and mix it into the rand.NewSource seed (time.Now().UnixNano() + int64(worker)), so that 100 concurrently-launched goroutines don't all receive the same nanosecond timestamp and end up with identical add/evict sequences. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: move test helpers out of production source into testing.go The six *ForTest / ReconcileOnce helpers were previously mixed into containerprofilecache.go alongside production logic. Move them to a dedicated testing.go file in the same package. export_test.go is the idiomatic alternative but is compiled only when running tests in the same directory; test packages in other directories (tests/containerprofilecache/) import the non-test version of the package and never see _test.go contents. A plain testing.go is the correct pattern here — it signals "test support" by name and groups all scaffolding in one place, while remaining importable by any test binary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: move integration tests into package dir; use export_test.go export_test.go (package containerprofilecache) is only compiled during `go test` so test helpers never enter the production binary. This only works when callers are in the same directory; the prior layout put tests in tests/containerprofilecache/ (a separate package), forcing helpers into a plain testing.go that shipped in the binary. Moving the six test files into pkg/objectcache/containerprofilecache/ as package containerprofilecache_test fixes this correctly: - export_test.go replaces testing.go (test-binary-only) - package declaration: containerprofilecache_integration → containerprofilecache_test - packages_deleted_test.go Dir path: ../.. → ../../.. (module root) - tests/containerprofilecache/ directory removed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: nil out overlay pointers when k8s client returns zero-value on 404 The Kubernetes generated client (gentype.Client.Get) pre-allocates a zero-value struct before the HTTP call and returns it as the result even on error (e.g. 404 not-found). In refreshOneEntry, the four overlay fetch paths (userManagedAP, userManagedNN, userAP, userNN) guarded only the "transient error with cached RV → keep old entry" branch; the "first-time 404, no cached RV" branch fell through with a non-nil empty-ObjectMeta struct still in the pointer, which reached rebuildEntryFromSources → emitOverlayMetrics and logged spurious "user-authored legacy profile merged" warnings with empty namespace/name/resourceVersion fields. Add an explicit nil-out after each non-returning error branch, mirroring the pattern already used in tryPopulateEntry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

…vice file handling (kubescape#791) Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

…kubescape#798) * perf: disable file-digest/metadata/executable catalogers These three catalogers iterate every file in the scan tree and dominate transient allocation, but their outputs are not consumed by the OOM-relevant SBOM path. Disabling them saves ~200 MB peak RSS on gitlab-ee (main) and stacks with upstream selective-indexing + binary-prefilter improvements to ~1.12 GB total (vs 1.62 GB baseline, fits 1.5 GB cgroup). Signed-off-by: Ben <ben@armosec.io> * deps: switch to kubescape/syft v1.32.0-ks.2 for memory reduction Routes anchore/syft imports to the kubescape fork via replace directive. The fork carries selective indexing + binary-cataloger pre-filtering on top of v1.32.0; combined with the file-cataloger disable in the parent commit, this reduces gitlab-ee scan peak RSS from 1,621 MB to 1,123 MB. Refs: NAUT-1283 Signed-off-by: Ben <ben@armosec.io> * fix: check dep.Replace for actual fork version; add cataloger removals to sidecar - packageVersion() now returns dep.Replace.Version when present so the fork tag (v1.32.0-ks.2) propagates to runtime metadata and version-gating logic - pkg/sbomscanner/v1/server.go: add the same WithCatalogerSelection/WithRemovals as sbom_manager.go so both SBOM paths drop file-digest/metadata/executable catalogers and stay in consistent memory behaviour Signed-off-by: Ben <ben@armosec.io> * fix: keep syft tool version at required version Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Ben <ben@armosec.io> Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The application-profile recorder in ReportFileExec derives the path it stores into the AP from `args[0]`, while the rule-side resolver (`parse.get_exec_path` in pkg/rulemanager/cel/libraries/parse/parse.go) falls back to `comm` when `args[0]` is empty. The asymmetry causes "Unexpected process launched" (R0001) and other ap.was_executed-based rules to fire on processes that are present in the application profile. Trigger: fexecve / execveat with AT_EMPTY_PATH. modern libpam (>= 1.5) invokes its helpers (unix_chkpwd, unix_update, ...) via fexecve to avoid TOCTOU on the helper path. The kernel implements fexecve as execveat(fd, "", argv, envp, AT_EMPTY_PATH) — pathname is empty by design. Inspektor Gadget's trace_exec puts the syscall pathname into args[0] and reads argv from index 1 (gadgets/trace_exec/program.bpf.c:146-153). For fexecve/execveat empty-pathname, this produces args = ["", argv[1]] in the agent's exec event. The recorder then sets path = args[0] = "" and the AP entry is unreachable to ap.was_executed("unix_chkpwd") (which the rule-side resolver computes via the empty-args[0] -> comm fallback). Fix: derive the recorder's path the same way the rule-side does — prefer exepath (the kernel-authoritative exe_file path, immune to argv[0] spoofing too), then argv[0] when non-empty, then comm. Concrete impact in production: 408 of 1976 Bonial I013 incidents on production scoring-api APs are exactly this case — cron user-context setup invokes pam_unix -> unix_chkpwd via fexecve, AP records path: "" with args ["", "root"], rule looks up "unix_chkpwd" via comm fallback, no match. The new resolveExecPath helper is also more defensive against argv[0] spoofing in general — exepath comes from task->mm->exe_file in the BPF side and cannot be controlled by user code. Verified locally on a kind cluster with kubescape v0.3.94: a pod that loops execve (control) and execveat-AT_EMPTY_PATH (bug) reproduces the production-shape AP entry on the unfixed code path. Signed-off-by: Ben <ben@armosec.io>

* implement Rule-Aware Profile Projection Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * fix: address CodeRabbit review comments (batch 2) - profiledata.go: reset receiver in UnmarshalJSON/YAML for ProfileDataRequired and FieldRequirement; add PatternObject unknown-field rejection - function_cache.go: include SyncChecksum in cache key to invalidate on profile content changes (not only spec changes); iterate all extraKeyFn callbacks - rule_manager.go: gate strict-validation rejection behind StrictValidation flag; coalesce specNotify bursts before recompile - exec.go: document wasExecutedWithArgs v1 limitation for rule authors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * docs: document wasExecutedWithArgs v1 path-only matching limitation Add a CEL Helper Limitations table to the Detection Rules section noting that wasExecutedWithArgs currently performs path-only matching (equivalent to wasExecuted) and does not validate the argument list in v1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: pass-through all profile data when no rules declare profileDataRequired When InUse=false (no rule declared a requirement for a field), projectField was returning an empty ProjectedField{}, causing CEL helpers to see no profile data and fire false-positive alerts for every exec/open/capability/etc. Fix: treat InUse=false as All=true (pass-through), so existing rules that omit profileDataRequired continue working with the full raw profile. Update TestApply_NilSpec, TestApply_DynamicNotRetainedWhenNotInUse (renamed), and TestSpecChange_TriggersReprojection to reflect the new pass-through semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update stale comments and strengthen reprojection test - projection_apply.go: update Apply doc-comment and dynamic-patterns comment to reflect pass-through semantics (InUse=false retains all data) - reconciler_test.go: add SpecHash assertions to TestSpecChange to prove reprojection actually occurred rather than testing pass-through twice Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: improve error logging for user-managed resource fetch failures Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: add profileDataRequired field for rule-aware projection requirements Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: enhance profileDataRequired field to allow additional properties for rule-aware projection Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: update profileDataRequired field to preserve unknown fields for rule-aware projection Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: run malicious job from /app to use the rule watched path Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: change working directory for malicious job to /var/log Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: increase timeout for helm upgrade and kubectl wait in component tests; update malicious job to include command and args Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: update malicious job working directory to /tmp and modify command for service account token access Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: update malicious job to read environment variables from /proc/self/environ Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: create marker file in /var/lib/r0002-test for malicious job Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * feat: enable file access anomalies detection (R0002) Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> --------- Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* get services from API, removing sidecar requirement Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> * fix: add timeout and file-based fallback to LoadServiceURLs - Bound HTTP service discovery to 10 s so a slow/unreachable API cannot stall node-agent startup; failure is handled gracefully by the existing nil-check at the call site. - Restore SERVICES env var / /etc/config/services.json fallback (using ServiceDiscoveryFileV3) so sidecar deployments retain scan-failure reporting without requiring migration to API_URL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

… surface Living documentation for the feat/network-wildcards work. Each fixture is a complete, kubectl-applicable NetworkNeighborhood document exercising ONE edge case in the v0.0.2 wildcard surface. Test_34 (forthcoming) consumes them directly; users learning the syntax can copy-paste them as authoritative examples. Coverage: 01 — IPv4 literal in ipAddresses[] 02 — IPv6 literal (canonicalisation) 03 — IPv4 CIDR 04 — IPv6 CIDR 05 — '*' sentinel for ANY IP (with discouragement annotation) 06 — 0.0.0.0/0 + ::/0 (RFC-aligned alternative to '*') 07 — mixed list (literal + CIDR + sentinel) 08 — backward-compat singular ipAddress 09 — DNS literal 10 — DNS leading '*' (RFC 4592) 11 — DNS mid '⋯' (DynamicIdentifier) 12 — DNS trailing '*' (one or more, never zero) 13 — trailing-dot normalisation 14 — '**' recursive — admission MUST reject 15 — egress + ingress on same container, direction isolation 16 — egress: [] NONE (declared zero-egress) 17 — realistic Stripe API + cluster DNS 18 — Kubernetes service-FQDN via mid '⋯' (the user's case) 19 — port + protocol + CIDR composed 20 — multi-container pod, different rules per container README.md indexes all fixtures and lists the wildcard token vocabulary. Each fixture's header comment lists the edge case, expected outcomes, match path, spec reference, and operational guidance. Ready to be consumed by node-agent's Test_34_NetworkWildcardSurface (forthcoming) and by storage's networkmatch unit tests via testdata-style references.

Replaces byte-equality with the v0.0.2 wildcard-aware matchers from storage's pkg/registry/file/networkmatch — applied symmetrically to all six nn.* CEL functions (egress + ingress mirror images): nn.was_address_in_egress / _in_ingress nn.is_domain_in_egress / _in_ingress nn.was_address_port_protocol_in_egress / _in_ingress Each function now walks BOTH the deprecated singular field (IPAddress / DNS, byte-equality, back-compat) AND the new plural field (IPAddresses / DNSNames, wildcard-aware) on each NetworkNeighbor entry. A profile that uses only the deprecated form behaves exactly as before; a profile that uses the new form gains CIDR + wildcard matching with no rule-side changes required. Two helpers (neighborMatchesIP / neighborMatchesDNS) factor the two-list walk so the six call sites stay readable. Compiled-form caching of the matcher across calls is deferred to a follow-up — the existing cel functionCache still memoises (containerID, observed) tuples, so the per-call MatchIP/MatchDNS overhead only fires on cache misses. Tests cover: - CIDR membership across egress/ingress - '*' sentinel for any IP - leading-* DNS wildcard (RFC 4592, exactly one label) - mid-⋯ DynamicLabel (the kubernetes service-FQDN case) - trailing-dot resilience - direction isolation (egress and ingress lists are walked independently — same address allowed on one direction must NOT match the other) - back-compat: deprecated singular IPAddress/DNS still works - mixed: profile with one entry using singular, another using plural - composed match: CIDR + port + protocol on the granular variant go.mod: temporary local-path replace for kubescape/storage so the node-agent picks up the in-flight feat/network-wildcards work; user flips back to fork ref before pushing.

TestFixturesParse: every YAML under tests/resources/network-wildcards/ parses against the v1beta1 NetworkNeighborhood schema. The fixtures double as authoritative user-facing syntax documentation, so a fixture that fails to parse is a documentation bug. TestFixturesMatchExpectedBehaviour: representative observed→match triples for each major edge case (literal IP, CIDR, '*' sentinel, deprecated singular IPAddress, leading-* DNS RFC 4592, mid-⋯ DynamicLabel, direction isolation between egress and ingress) are exercised through the actual nn.* CEL functions. If a fixture's header comment says '10.1.2.3 → match' and the matcher disagrees, ONE of them is wrong; this test pins both. True end-to-end Test_34_NetworkWildcardSurface (kubectl-applies the fixtures against a live cluster) belongs in the iximiuz lab; that job is left for the lab pass once the storage + node-agent images ship via the fork CI.

Local replace points at ../storage so the fork ref isn't fetched. User reverts both go.mod and go.sum before pushing the branch.

Updates the storage replace to a pseudo-version on the fork that includes the v0.0.2 wildcard surface (pkg/registry/file/networkmatch/, IPAddresses schema field, REST validation). Build and tests stay green against the pinned ref. The .claude/ entry on .gitignore prevents the agent state directory from being tracked accidentally.

Five findings, all legit, all fixed: - Port range guard (Major): wasAddressPortProtocolInEgress/Ingress now reject portInt outside [0, 65535] BEFORE narrowing to int32. Without this, a CEL value like 4294967739 wraps to 443 and would falsely match a port-443 entry. New TestWasAddressPortProtocolInEgress_ PortWrapRejected pins the contract. - neighborMatchesDNS now routes the deprecated singular DNS field through MatchDNS (single-element slice) instead of raw string equality, so back-compat behaviour gets the same trailing-dot stripping + lowercasing as the new DNSNames[]. New TestIsDomainInEgress_DeprecatedDNS_TrailingDotParity pins this. - Direction-isolation fixture test now exercises BOTH wasAddressInEgress and wasAddressInIngress for each observation, via a new ipBothCheck struct. The prior version only checked egress, so a regression that broke ingress matching would have slipped through. - TestFixturesParse uses yaml.UnmarshalStrict so a typo in any user- facing fixture (the YAML files double as documentation) fails the test instead of silently parsing. - README clarifies that fixture 14 is intentionally rejected at admission and shouldn't be kubectl-applied — points readers at the index entry so they don't try to use it as a template. Also bumps the storage replace to e1263bf6, which carries storage's CR fixes (deprecated IPAddress validation, ValidateUpdate now also runs network-profile validation, field-path assertions in admission tests).

Pulls in storage's CR round-2 fixes: deterministic admission error ordering across container groups, and field-path assertions on the ValidateUpdate test.

Pulls in storage's deprecated-DNS validation parity fix.

Two findings, both nitpick-level, both applied: - Remove the unused 'maps', 'objectcache', 'objectcachev1' imports from fixtures_test.go along with the blank-identifier _ = ... lines at the bottom that existed only to silence the unused-import error. buildLibWithContainer is defined in wildcard_test.go (same package), so fixtures_test.go has no real need for those imports. - Route the deprecated singular IPAddress through networkmatch.MatchIP for symmetry with the deprecated singular DNS (which round 1 already routed through MatchDNS). Both deprecated fields now get the same canonicalisation (IPv6 expanded forms, IPv4-mapped IPv6) as the new list fields. New TestWasAddressInEgress_DeprecatedIPAddress_ IPv6Canonicalisation pins this.

…nd 3) CR caught that the round-2 routing of deprecated IPAddress through MatchIP had a documentation gap: existing tests only proved literal + canonical (IPv6) matching, never the wildcard/CIDR semantics that MatchIP now also enables on the deprecated field. Adds TestWasAddressInEgress_DeprecatedIPAddress_AcceptsWildcardAndCIDR which pins the contract: deprecated singular field accepts the SAME wildcard token vocabulary as the new list form — '*' sentinel, CIDRs, 0.0.0.0/0 and ::/0 alternatives. Comment on neighborMatchesIP documents this is intentional unification, not accidental.

…scape#803) Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

Profile-checksum-invalidated cache of compiled networkmatch.IPMatcher / DNSMatcher per (containerID, neighborIndex). The previous code path re-compiled every NetworkNeighbor's entries on each CEL function-cache miss; this PR builds each matcher at most once per profile-checksum lifetime and reuses it across subsequent misses. Design: matcherCache (sync.Map) inside nnLibrary, zero-value safe so existing test fixtures that construct nnLibrary{} directly continue to work without changes. Per-container entry tagged with the profile's SyncChecksumMetadataKey annotation. On lookup: if checksum matches, reuse; else allocate a fresh containerMatchers and store with LoadOrStore (concurrent-safe). Per-neighbor matchers are nil-init and lazily compiled on first use, so a profile with 10 egress entries that only ever fires through 2 of them pays compile cost for only those 2. Benchmarks (arm64, -benchtime=1s): IP, realistic profile (5 neighbors x 3 entries, observation misses all): Cold (per-call recompile): 1733 ns/op 1920 B/op 76 allocs/op Hot (cached matchers) : 177 ns/op 32 B/op 2 allocs/op ~ -90% time, -98% bytes, -97% allocs DNS, realistic profile: Cold: 1219 ns/op 1800 B/op 41 allocs/op Hot : 318 ns/op 272 B/op 7 allocs/op ~ -74% time, -85% bytes, -83% allocs Churning profile (checksum flips every iteration — pathological): 1527 ns/op 1936 B/op 77 allocs/op Matches cold path: cache overhead itself is negligible; the savings come strictly from amortising compile across stable-checksum windows. In production this stacks on top of the existing CEL functionCache (which already absorbs same-(containerID,observed) cache hits). The matcher cache catches what slips through: unique-observation cache misses within a profile-checksum lifetime. Touched: - matcher_cache.go new file: cache impl - matcher_cache_bench_test.go new file: comparison bench - network.go use cached matchers in all 6 CEL fns - nn.go matcherCache field on nnLibrary

…s replace (CR #42) Two findings from CodeRabbit round 1, both fixed: 1. Stale-entry shape race in getOrBuild (Major) Old code used LoadOrStore on the staleness path and only replaced on checksum mismatch — but a shape mismatch (neighbor count change) could leak the stale entry to a caller whose profile has a different shape, which then index-panics in ipMatcher/dnsMatcher. Fix: when staleness is detected (by checksum OR shape), always Store unconditionally. Worst-case contention: several goroutines build shape-correct fresh entries and one Store wins; all callers still see a shape-correct entry. Orphans get GC'd. 2. Unsynchronised lazy-init of per-neighbor matchers (Critical) neighborMatchers.ip / .dns were *Matcher with a non-atomic 'if nil then build then assign' pattern — a real data race. Fix: switched to atomic.Pointer[networkmatch.IPMatcher] (and DNS). First-build callers may race on Compile but only one pointer wins via CompareAndSwap; everyone returns the winning matcher. Pure functions (no shared state) so duplicate Compile work is wasteful but not incorrect. New tests in matcher_cache_test.go pin the contract: - TestMatcherCache_ConcurrentFirstBuild: 64 goroutines racing on the same slot, run under -race, asserts matchers are populated exactly once - TestMatcherCache_StaleEntryReplaced: shape-mismatch path returns a fresh containerMatchers, not the stale one - TestMatcherCache_ChecksumPreservedAcrossCalls: same checksum hits cache (no rebuild) Benchmarks re-run after atomic.Pointer switch — negligible impact (177 → 186 ns/op, still 8x faster than cold path). All headline savings preserved.

…und 2) Without the barrier, goroutine launch jitter staggers first-call arrivals, hiding any unsynchronised-write data race during the first-build window. With the barrier, all 64 goroutines hit the contended path simultaneously when close(start) fires — much tighter race-detector coverage of the atomic.Pointer.CompareAndSwap path.

Upstream brings: - kubescape#788 Replace AP+NN cache with ContainerProfileCacheImpl (51 files) - kubescape#799 Rule-Aware Profile Projection (53 files, projection_apply/compile/trie) - kubescape#800 fix(exec path symmetric resolver) - kubescape#797 learning-period label - kubescape#798 syft v1.32.0-ks.2 perf - kubescape#791 client CA from kubelet config - kubescape#803 logging improvements - kubescape#772 services from API Conflicts resolved by taking upstream's version for the reshape-owned packages (containerprofilecache, metricsmanager, profilehelper, applicationprofile/*, rule_manager, rulepolicy, mock.go) and dropping the matcher_cache.go layer entirely — upstream's ProjectedContainerProfile already pre-compiles + caches per-profile, so a per-call matcher cache on top would duplicate effort. Fork-only tamper detection wiring restored on the new ContainerProfileCacheImpl struct (tamperAlertExporter + tamperEmitted fields). cmd/main.go updated to the new NewRulesWatcher signature. Storage pinned to k8sstormcenter/storage@b23d85f0 (merge/upstream-profile-rearch of the storage fork, which carries our networkmatch + IPAddresses schema + upstream's clean-standalone-pods PR). Wildcard behavior recovery (re-routing nn.* CEL functions through networkmatch on the projected Patterns surface) lands as the next commit; this one is the mechanical merge landing point.

…ction Upstream's projection-v1 (PR kubescape#799) explicitly dropped two pieces of behaviour that the fork's earlier wildcard work relied on: 1. Network-surface Patterns (CIDRs, '*' sentinels, DNS leading-/mid-/ trailing-wildcards) were never populated because projectField only routed entries to Patterns on path surfaces. 2. wasExecutedWithArgs() degraded to path-only matching — 'args list is validated but not matched against', with a comment that ExecArgsByPath is 'future work'. This commit re-introduces both, layered cleanly on top of the projection rather than working around it: Network wildcards (spec §5.7, §5.8) - projectField: third parameter is now an isDynamic classifier rather than the isPathSurface bool. Path surfaces pass containsDynamicSegment; network surfaces pass isNetworkIPWildcard or isNetworkDNSWildcard (each fork-defined here). - extractEgressAddresses / extractIngressAddresses now also pull the v0.0.2 IPAddresses[] list-form alongside the deprecated singular IPAddress (storage's networkmatch wildcards land in Patterns). - CEL helpers (nn.was_address_in_*, nn.is_domain_in_*) now consult Values, All, and Patterns via networkmatch.MatchIP / MatchDNS. - matchIPField canonicalises observed IPs (net.ParseIP) so IPv6 expanded forms and ::ffff: IPv4-mapped addresses hit the Values fast path. - matchDNSField normalises trailing dots on observed and tries both forms against Values. Exec-args matching restored - ProjectedContainerProfile gains ExecsByPath map[string][]string — the per-Path Args slice from cp.Spec.Execs. - extractExecsByPath populates it in projection_apply. - wasExecutedWithArgs runs dynamicpathdetector.CompareExecArgs against the matched profile entry. Back-compat: a path with no ExecsByPath entry matches with no argv constraint (preserves old wasExecuted- equivalent behaviour for partial profiles). Mock parity - RuleObjectCacheMock.GetProjectedContainerProfile now routes the same classifications (network wildcards → Patterns, path dynamics → Patterns, ExecsByPath populated). Tests no longer need a real cache. - ensureProjectedAllInit no longer mis-sets All=true (that's the match-any sentinel, not a comprehensiveness hint). Tamper detection survives - tamperAlertExporter + tamperEmitted fields re-added to the new ContainerProfileCacheImpl so the R1016 wiring keeps working. - exporters import added. Storage pin: k8sstormcenter/storage @ b23d85f0 (merge/upstream- profile-rearch, which carries networkmatch + IPAddresses schema + upstream's clean-standalone-pods). Known port/protocol regression (degrade-noted in tests): - was_address_port_protocol_in_egress / _in_ingress still degrade to address-only — port/protocol granularity needs an AddressPortsByAddr projection field which upstream noted as future work. Updated unit tests document the degradation; the only production rules that would exercise this didn't use the helper anyway. Full suite: 46/48 packages green. 2 failing (containerwatcher/v2/tracers, validator) are pre-existing eBPF kernel-privilege issues that reproduce on main without root.

coderabbitai · 2026-05-13T15:26:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces raw container-profile storage with projection-driven ProjectedContainerProfile; adds CompileSpec/Apply/trie matchers; integrates projection into the cache and reconciler; rewires CEL libraries to use projected views with cache-keying and helper metrics; expands tests, fixtures, charts, and bootstrap wiring.

Changes

Projection & CEL migration

Layer / File(s)	Summary
Bootstrap & config `.gitignore`, `README.md`, `go.mod`, `cmd/main.go`, `.github/workflows/`, `tests/chart/templates/`	Adds .gitignore entries, documents CEL helper limitation, bumps dependencies, passes Prometheus exporter into CEL, switches service discovery, updates CI matrix and chart templates (`was_executed_with_args` wiring).
Projection types & CompileSpec `pkg/objectcache/projection_types.go`, `pkg/objectcache/containerprofilecache/projection_compile.go`, `projection_trie.go`, tests	Adds RuleProjectionSpec/FieldSpec/ProjectedContainerProfile/ProjectedField, deterministic CompileSpec, prefix/suffix trie matchers and tests.
Apply projection & tests `pkg/objectcache/containerprofilecache/projection_apply.go`, `projection_apply_test.go`	Implements Apply to produce projected views (Values, Patterns, PrefixHits/SuffixHits, ExecsByPath), classifiers, extractors, and comprehensive unit tests for semantics and idempotency.
ContainerProfileCache impl & reconciler `pkg/objectcache/containerprofilecache/containerprofilecache.go`, `reconciler.go`, reconciler tests	Cache now stores ProjectedContainerProfile+SpecHash, adds SetProjectionSpec, nudge/refresh, stale-entry counting, observeMemoryMetrics, and updates many tests to use GetProjectedContainerProfile.
Cache interface & mocks `pkg/objectcache/containerprofilecache_interface.go`, `pkg/objectcache/v1/mock.go`	Replaces GetContainerProfile with GetProjectedContainerProfile and adds SetProjectionSpec; v1 mock returns projection-aware results and stores installed spec.
Metrics & function-cache keying `pkg/metricsmanager/*`, `pkg/rulemanager/cel/libraries/cache/function_cache.go`	Extends MetricsManager API and Prometheus implementation for projection/memory metrics; adds HashForContainerProfile and extra cache-key hooks so CEL helper results are scoped by profile hash/checksum.
ApplicationProfile CEL library `pkg/rulemanager/cel/libraries/applicationprofile/*`	AP helpers now query projected profiles, add optional helper-call metrics, adjust caching to include profile hash, and change exec/args, open, endpoint, syscall, capability logic accordingly; flags-based open helper removed.
NetworkNeighborhood CEL library & fixtures `pkg/rulemanager/cel/libraries/networkneighborhood/`, `tests/resources/network-wildcards/`	NN helpers match projected IP/DNS fields with wildcard/CIDR/DNS wildcard support; port/protocol checks degrade to address-only in v1; many fixtures and tests added/updated.
Parse exec-path & event recording `pkg/rulemanager/cel/libraries/parse/*`, `pkg/containerprofilemanager/v1/event_reporting.go`	Adds 3-arg parse.get_exec_path overload and resolveExecPath helper; event recording uses resolved path; rules updated to pass exepath and args-aware R0040 condition; tests added.
Rule types & RuleManager changes `pkg/rulemanager/types/v1/profiledata.go`, `pkg/rulemanager/rule_manager.go`, `pkg/rulemanager/rulepolicy.go`	Adds ProfileDataRequired schema with strict (de)serialization, wires Rule.ProfileDataRequired, compiles/installs projection spec on startup and on rule-binding changes, and updates policy validation to accept projected profiles.
Tests, exporters, charts, manifests `pkg/exporters/alert_manager.go`, `tests/*`, chart templates	Adds exepath alert label, updates component tests and malicious-job manifest, updates default rules to use new parse overload and was_executed_with_args, and adds network-wildcards fixtures and README.

Estimated code review effort:
🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs:

k8sstormcenter/node-agent#38: Related work on exec-argument wildcard matching and ap.was_executed_with_args behavior.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch merge/upstream-profile-rearch

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/component-tests.yaml:
- Line 114: Update the GitHub Actions step that references the setup action:
replace the usage of actions/setup-go@v4 with actions/setup-go@v5 to adopt the
recommended version; locate the workflow step that currently uses the symbol
"actions/setup-go@v4" and change it to "actions/setup-go@v5", ensuring no other
changes to the step configuration are required and the Go version inputs remain
intact.

In `@cmd/main.go`:
- Around line 398-401: When calling config.LoadServiceURLs in main (the block
that sets failureReporter via sbommanagerv1.NewHTTPSbomFailureReporter), ensure
failures are logged: if svcErr != nil log a warning/error with svcErr.Error()
and contextual message that service URL load failed, and if svcErr == nil but
services.GetReportReceiverHttpUrl() == "" log a warning that SBOM failure
reporting is disabled due to empty report receiver URL; keep the existing
successful-case log for enabled reporting. This ensures both LoadServiceURLs
errors and empty URL cases are explicitly logged with context.

In `@pkg/objectcache/containerprofilecache/projection_apply.go`:
- Around line 217-228: extractExecsByPath currently aliases
cp.Spec.Execs[*].Args into the returned map which permits external mutation of
the original ContainerProfile; to fix, ensure you clone each ExecCall.Args slice
before assigning into the map inside extractExecsByPath (detect nil vs empty
slice properly), i.e. for each e in cp.Spec.Execs copy the contents of e.Args
into a new slice and store that under m[e.Path] so mutations of the projected
map do not affect cp.Spec.Execs.

In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 200-202: The loop that does a blocking send to each notifier
(iterating over notifiers and doing *n <-
rulebindingmanager.RuleBindingNotify{}) can hang RefreshRuleBindingsRules() if a
receiver is slow; change the fan-out to perform non-blocking sends or bounded
enqueue with drop/coalesce. Replace the direct blocking send in the loop with a
select that attempts to send on *n and falls back to either a default branch
(drop the event) or a timed case (e.g., use a short timeout) so slow/unread
channels do not block; alternatively ensure notifier channels are buffered and
coalesce by only sending when the channel is empty to avoid queuing duplicates.
Target the notifier send in the loop over notifiers and update the send strategy
accordingly.

In `@pkg/rulemanager/cel/libraries/applicationprofile/open_test.go`:
- Around line 203-208: The test in open_test.go weakens the
ap.was_path_opened_with_flags contract by expecting true when flags ("O_WRONLY")
don't match the recorded event; update the test case (the one with name "Path
matches but flags don't match" / containerID "test-container-id" / path
"/etc/passwd") to set expectedResult to false so the
ap.was_path_opened_with_flags helper continues to require flag equality, or
alternatively rename/deprecate the helper to a path-only name if you
intentionally want path-only semantics; ensure the change is made in the test
case for ap.was_path_opened_with_flags so behavior remains consistent.

In `@pkg/rulemanager/cel/libraries/cache/function_cache.go`:
- Around line 87-91: The code currently assumes values[0] is non-nil before
calling values[0].Value(), which can panic; update the guard in the function
that uses the values slice (the branch that checks "if len(values) == 0 || oc ==
nil") to also verify values[0] != nil (and ideally values[0].Value() != nil)
before calling Value(); if nil, return "" same as when the slice is empty, then
proceed with the type assertion to containerIDStr only after this nil-check to
avoid panics during rule evaluation.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/network.go`:
- Around line 43-45: The code currently treats ProjectedField.All as an
unconditional match by returning true for any field that has field.All set;
remove that unconditional return and instead treat ProjectedField.All as a
projection/retention mode (i.e., do not implicitly match unknown IPs/DNS) —
change the branch in the function containing "if field.All { return true }" so
it either skips special-casing (continue evaluation) or only matches when the
profile explicitly contains the value, and implement proper wildcard handling
via an explicit wildcard flag (e.g., ProjectedField.Wildcard) if wildcard
semantics are required; apply the same change to the other occurrence around the
second block (lines 66-68).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 79cb8537-b2c8-4c97-9adf-2009c1d5b532

📥 Commits

Reviewing files that changed from the base of the PR and between a3fdc33 and d714d21.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (79)

.github/workflows/component-tests.yaml
.gitignore
README.md
cmd/main.go
go.mod
pkg/config/config.go
pkg/containerprofilemanager/v1/event_reporting.go
pkg/containerprofilemanager/v1/event_reporting_test.go
pkg/metricsmanager/metrics_manager_interface.go
pkg/metricsmanager/metrics_manager_mock.go
pkg/metricsmanager/metrics_manager_noop.go
pkg/metricsmanager/prometheus/prometheus.go
pkg/objectcache/containerprofilecache/containerprofilecache.go
pkg/objectcache/containerprofilecache/containerprofilecache_test.go
pkg/objectcache/containerprofilecache/init_eviction_test.go
pkg/objectcache/containerprofilecache/lock_stress_test.go
pkg/objectcache/containerprofilecache/projection_apply.go
pkg/objectcache/containerprofilecache/projection_apply_test.go
pkg/objectcache/containerprofilecache/projection_compile.go
pkg/objectcache/containerprofilecache/projection_compile_test.go
pkg/objectcache/containerprofilecache/projection_trie.go
pkg/objectcache/containerprofilecache/projection_trie_test.go
pkg/objectcache/containerprofilecache/reconciler.go
pkg/objectcache/containerprofilecache/reconciler_test.go
pkg/objectcache/containerprofilecache/shared_pointer_race_test.go
pkg/objectcache/containerprofilecache/t8_overlay_refresh_test.go
pkg/objectcache/containerprofilecache_interface.go
pkg/objectcache/projection_types.go
pkg/objectcache/v1/mock.go
pkg/rulebindingmanager/cache/cache.go
pkg/rulemanager/cel/cel.go
pkg/rulemanager/cel/libraries/applicationprofile/ap.go
pkg/rulemanager/cel/libraries/applicationprofile/capability.go
pkg/rulemanager/cel/libraries/applicationprofile/exec.go
pkg/rulemanager/cel/libraries/applicationprofile/exec_test.go
pkg/rulemanager/cel/libraries/applicationprofile/http.go
pkg/rulemanager/cel/libraries/applicationprofile/open.go
pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
pkg/rulemanager/cel/libraries/applicationprofile/syscall.go
pkg/rulemanager/cel/libraries/cache/function_cache.go
pkg/rulemanager/cel/libraries/networkneighborhood/fixtures_test.go
pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go
pkg/rulemanager/cel/libraries/networkneighborhood/network.go
pkg/rulemanager/cel/libraries/networkneighborhood/network_test.go
pkg/rulemanager/cel/libraries/networkneighborhood/nn.go
pkg/rulemanager/cel/libraries/networkneighborhood/wildcard_test.go
pkg/rulemanager/profilehelper/profilehelper.go
pkg/rulemanager/rule_manager.go
pkg/rulemanager/rulepolicy.go
pkg/rulemanager/types/v1/profiledata.go
pkg/rulemanager/types/v1/profiledata_test.go
pkg/rulemanager/types/v1/types.go
tests/chart/crds/rules.crd.yaml
tests/chart/templates/node-agent/configmap.yaml
tests/chart/templates/node-agent/default-rules.yaml
tests/chart/values.yaml
tests/component_test.go
tests/resources/malicious-job.yaml
tests/resources/network-wildcards/01-literal-ipv4.yaml
tests/resources/network-wildcards/02-literal-ipv6.yaml
tests/resources/network-wildcards/03-cidr-ipv4.yaml
tests/resources/network-wildcards/04-cidr-ipv6.yaml
tests/resources/network-wildcards/05-any-ip-sentinel.yaml
tests/resources/network-wildcards/06-any-as-cidr.yaml
tests/resources/network-wildcards/07-mixed-ip-list.yaml
tests/resources/network-wildcards/08-deprecated-ipaddress.yaml
tests/resources/network-wildcards/09-dns-literal.yaml
tests/resources/network-wildcards/10-dns-leading-wildcard.yaml
tests/resources/network-wildcards/11-dns-mid-ellipsis.yaml
tests/resources/network-wildcards/12-dns-trailing-star.yaml
tests/resources/network-wildcards/13-dns-trailing-dot-normalisation.yaml
tests/resources/network-wildcards/14-recursive-star-rejected.yaml
tests/resources/network-wildcards/15-egress-and-ingress.yaml
tests/resources/network-wildcards/16-egress-none.yaml
tests/resources/network-wildcards/17-realistic-stripe-api.yaml
tests/resources/network-wildcards/18-cluster-dns-via-mid-ellipsis.yaml
tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml
tests/resources/network-wildcards/20-multi-container-mixed-wildcards.yaml
tests/resources/network-wildcards/README.md

Two findings on code introduced by the wildcards recovery commit; the other 5 findings touch upstream code I didn't modify in this rebase and are out of scope. CRITICAL — drop field.All short-circuit in matchIPField/matchDNSField: ProjectedField.All is the producer-side flag set by projectField when no rule declared profileDataRequired for the surface (pass-through retention mode). In that mode projectField already populates Values with every raw entry, so the Values lookup catches the match. Treating All=true as a 'match any input' sentinel in the consumer would let an unknown IP/DNS match even when absent from the profile — a false-positive admission bug. Removed both All short-circuits. Values + Patterns lookups cover the semantic correctly: pass-through projects everything into Values; rule-declared mode filters Values to the declared subset and routes wildcards to Patterns. Either way, an unknown observation falls through to false. MAJOR — clone Args slice in extractExecsByPath: Apply() is contract-bound to be a pure transform of the source profile. extractExecsByPath was aliasing cp.Spec.Execs[i].Args into the projected map, so a consumer mutating the projected slice could silently corrupt the underlying CRD pointer. Cloned via copy() so Apply stays observably pure. Skipped (upstream code not touched in this rebase): - actions/setup-go@v4 in component-tests.yaml - Silent SBOM URL error in cmd/main.go - Blocking channel send in rulebindingmanager/cache.go - Pre-existing was_path_opened_with_flags behaviour in open_test.go - Nil CEL arg guard in function_cache.go

github-actions · 2026-05-13T15:42:36Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.000	0.000	N/A
Peak CPU (cores)	0.000	0.000	N/A
Avg Memory (MiB)	0.000	0.000	N/A
Peak Memory (MiB)	0.000	0.000	N/A

Dedup Effectiveness

No data available.

entlein · 2026-05-13T16:02:50Z

@coderabbitai resolve

coderabbitai · 2026-05-13T16:03:05Z

✅ Actions performed

Comments resolved and changes approved.

github-actions · 2026-05-14T14:05:28Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.000	0.000	N/A
Peak CPU (cores)	0.000	0.000	N/A
Avg Memory (MiB)	0.000	0.000	N/A
Peak Memory (MiB)	0.000	0.000	N/A

Dedup Effectiveness

No data available.

github-actions · 2026-05-14T14:42:08Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.233	0.226	-2.8%
Peak CPU (cores)	0.256	0.238	-7.1%
Avg Memory (MiB)	331.032	268.576	-18.9%
Peak Memory (MiB)	332.367	274.344	-17.5%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	1	0	0.0%
hardlink	6000	0	0.0%
http	1767	119396	98.5%
network	903	77938	98.9%
open	29618	622149	95.5%
symlink	6000	0	0.0%
syscall	975	1887	65.9%

Event Counters

Metric	BEFORE	AFTER
capability_counter	10	9
dns_counter	1389	1432
exec_counter	7038	7282
network_counter	92403	95558
open_counter	766037	793198
syscall_counter	3510	3491

entlein · 2026-05-14T14:42:09Z

@coderabbitai review

coderabbitai · 2026-05-14T14:42:17Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…tract End-to-end unit test for the Test_32 dependency chain: projectUserProfiles → mergeApplicationProfile → Apply → projectField(Execs). Asserts that user-overlay AP Execs land in projected.Execs.Values keyed by path (both full-path "/bin/sh" and bare-name "sh" variants) AND in projected.ExecsByPath for the was_executed_with_args matcher. This passes locally — which proves the projection logic isn't the cause of Test_32's R0001-precondition failure in CT. The gap must be elsewhere: stale function_cache, reconciler timing, or cache-key invalidation not tracking entry updates. Test kubescape#61 tracks the live- agent diagnosis.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectcache/containerprofilecache/tamper_alert_test.go`:
- Around line 255-280: Add a second test case that constructs a profile which
appears signed (set the annotation(s) that make IsSigned return true) but
deliberately triggers a non-tamper verifier error (e.g., malformed/invalid
signature annotation that causes decode/hash/verifier-construction to fail) so
verifyUserApplicationProfile runs the verifier path but fails with an
operational error; reuse ContainerProfileCacheImpl, call SetTamperAlertExporter
with captureExporter, invoke verifyUserApplicationProfile on that
signed-but-malformed profile, and assert exporter.ruleAlerts() length is 0 (no
R1016 alerts).

In `@tests/component_test.go`:
- Around line 3473-3493: The test currently treats absence of R0001 as proof the
matcher ran, but ExecIntoPod failures also yield zero alerts; after each call to
ExecIntoPod in the affected subtests (the calls that precede using alerts
at/around the assertR0001Silent checks and the later R0040 assertions) add an
explicit require.NoError(t, err) to fail the test if the exec failed—locate the
ExecIntoPod invocations in these subtests and insert require.NoError(t, err)
immediately after each call so the test only proceeds to inspect alerts when the
exec actually succeeded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f5bebb7d-36bc-4e09-8db4-2dc13478b6b8

📥 Commits

Reviewing files that changed from the base of the PR and between 31a3fb1 and c3b692e.

📒 Files selected for processing (4)

.github/workflows/component-tests.yaml
pkg/objectcache/containerprofilecache/containerprofilecache.go
pkg/objectcache/containerprofilecache/tamper_alert_test.go
tests/component_test.go

coderabbitai · 2026-05-14T14:51:32Z

+// TestVerifyAP_OperationalError_DoesNotEmit pins the inverse contract:
+// when verification fails with a non-tamper error (hash compute,
+// verifier construction, decode), the exporter must NOT receive an
+// R1016 — operational errors are logged and either dropped or surfaced
+// via strict-mode loading refusal, but never as a tamper alert.
+func TestVerifyAP_OperationalError_DoesNotEmit(t *testing.T) {
+	// Construct an AP with an UNSIGNED-looking annotation set so
+	// IsSigned returns false — verify exits early without invoking the
+	// cosign path at all. Confirms the unsigned short-circuit emits
+	// nothing.
+	profile := &v1beta1.ApplicationProfile{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:            "unsigned",
+			Namespace:       "test-ns",
+			ResourceVersion: "1",
+		},
+	}
+
+	exporter := &captureExporter{}
+	c := &ContainerProfileCacheImpl{}
+	c.SetTamperAlertExporter(exporter)
+
+	c.verifyUserApplicationProfile(profile, "wlid://test/cluster/ns/Pod/p")
+	if got := len(exporter.ruleAlerts()); got != 0 {
+		t.Errorf("unsigned AP produced %d R1016 alerts; want 0", got)
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Operational-error coverage is missing in this test path.

This test builds an unsigned profile, so verifyUserApplicationProfile short-circuits before verifier execution. It won’t catch regressions where decode/hash/verifier-construction errors are misclassified and emit R1016. Please add a signed-profile case that forces a non-tamper verifier error and assert zero alerts there.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/objectcache/containerprofilecache/tamper_alert_test.go` around lines 255 - 280, Add a second test case that constructs a profile which appears signed (set the annotation(s) that make IsSigned return true) but deliberately triggers a non-tamper verifier error (e.g., malformed/invalid signature annotation that causes decode/hash/verifier-construction to fail) so verifyUserApplicationProfile runs the verifier path but fails with an operational error; reuse ContainerProfileCacheImpl, call SetTamperAlertExporter with captureExporter, invoke verifyUserApplicationProfile on that signed-but-malformed profile, and assert exporter.ruleAlerts() length is 0 (no R1016 alerts).

coderabbitai · 2026-05-14T14:51:32Z

+	// R0001 silence is a precondition for every subtest below: it means
+	// parse.get_exec_path resolved to the profile's Path key, so R0040
+	// gets to evaluate its argv comparison cleanly. A non-zero R0001 for
+	// the test binary's comm means the recording / capture / resolution
+	// chain dropped event.exepath — that's a separate bug (track it in
+	// the recording side, not in R0040), and asserting it here fails the
+	// subtest on the right axis instead of polluting the R0040 signal.
+	assertR0001Silent := func(t *testing.T, alerts []testutils.Alert, comm string) {
+		t.Helper()
+		n := 0
+		for _, a := range alerts {
+			if a.Labels["rule_id"] == "R0001" && a.Labels["comm"] == comm {
+				n++
+			}
+		}
+		require.Zero(t, n,
+			"R0001 precondition: path resolution failed for comm=%q. "+
+				"parse.get_exec_path either didn't receive event.exepath or "+
+				"profile Path doesn't match its return value. Fix capture-side "+
+				"exepath before reading R0040 results from this subtest.", comm)
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Require the exec to succeed before treating R0001 silence as proof the matcher was exercised.

Line 3488 only proves no R0001 alert was seen. In the "must stay silent" subtests, a failed ExecIntoPod also gives you zero R0001 and zero R0040, so the test can pass without ever executing the target binary. Add require.NoError(t, err) right after each ExecIntoPod before evaluating alerts.

Also applies to: 3508-3510, 3545-3547

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/component_test.go` around lines 3473 - 3493, The test currently treats absence of R0001 as proof the matcher ran, but ExecIntoPod failures also yield zero alerts; after each call to ExecIntoPod in the affected subtests (the calls that precede using alerts at/around the assertR0001Silent checks and the later R0040 assertions) add an explicit require.NoError(t, err) to fail the test if the exec failed—locate the ExecIntoPod invocations in these subtests and insert require.NoError(t, err) immediately after each call so the test only proceeds to inspect alerts when the exec actually succeeded.

github-actions · 2026-05-14T15:29:23Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.241	0.243	+1.1%
Peak CPU (cores)	0.260	0.273	+4.8%
Avg Memory (MiB)	277.620	267.645	-3.6%
Peak Memory (MiB)	278.652	269.484	-3.3%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	0	0	N/A
hardlink	6000	0	0.0%
http	1711	119451	98.6%
network	907	77935	98.8%
open	30367	621169	95.3%
symlink	6000	0	0.0%
syscall	980	1894	65.9%

Event Counters

Metric	BEFORE	AFTER
capability_counter	10	8
dns_counter	1443	1443
exec_counter	7220	7246
network_counter	94933	95281
open_counter	786778	788152
syscall_counter	3666	3602

…2 root cause) The rulemanager's function_cache invalidation key ( pkg/rulemanager/cel/libraries/cache/function_cache.go: HashForContainerProfile) is SpecHash + "|" + SyncChecksum. SpecHash only changes when rule definitions change. SyncChecksum was being read straight from cp.Annotations[SyncChecksumMetadataKey], where cp is the merged ContainerProfile inheriting only the baseline's annotations. For containers labelled with kubescape.io/user-defined-profile, the recording side correctly skips learning into the baseline (see pkg/containerprofilemanager/v1/lifecycle.go:100). The reconciler then synthesises an empty effectiveCP with only CompletionMetadataKey and StatusMetadataKey annotations — NO SyncChecksumMetadataKey. After user-overlay merge, the projected SyncChecksum stayed "" indefinitely. Result: a cached "ap.was_executed=false" lookup computed during the no-overlay window (e.g. first projection pass with a transient overlay fetch error) survives a subsequent reconcile that successfully fetches and merges the overlay — because the cache key didn't change. R0001 fires forever on user-overlay-allowed paths. Test_32 trips its R0001 precondition. Test_28's nslookup R0001 fires unnoticed. Fix: in projectUserProfiles, after the merge, append user-overlay identity (ns/name@RV) to the merged profile's SyncChecksumMetadataKey annotation. Apply propagates it to ProjectedContainerProfile .SyncChecksum, which the function_cache key includes — so the cache key flips when an overlay arrives, when an overlay updates (RV bump), and when an overlay disappears. Unit-pinned via: - TestT32_UserOverlayExecsReachProjectedValues (projection-pipeline smoke — confirms the merge logic itself works; passed BEFORE this fix already) - TestT32_SyncChecksumReflectsUserOverlayIdentity (the actual bug pin — FAILED before this fix, passes after) Resolves the latent rule-evaluator wiring gap behind Test_32 and the silent R0001 noise in Test_28.

…cape#61) Diagnostic warning to capture exactly what's in the projected container profile's Execs.Values at the moment a was_executed lookup returns false. Compared against the agent's merge log (which claims the user-overlay's Execs were appended), this will pinpoint whether the projection actually contains the merged paths at decision time. Remove this diagnostic after the merge-vs-evaluator gap is closed. The user-overlay SyncChecksum fix (f80f349) did NOT silence R0001 on user-overlay-allowed paths in CT — so the bug is elsewhere (not cache invalidation). This diagnostic narrows it.

github-actions · 2026-05-14T16:16:01Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.000	0.000	N/A
Peak CPU (cores)	0.000	0.000	N/A
Avg Memory (MiB)	0.000	0.000	N/A
Peak Memory (MiB)	0.000	0.000	N/A

Dedup Effectiveness

No data available.

github-actions · 2026-05-14T16:52:03Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.225	0.227	+1.0%
Peak CPU (cores)	0.234	0.240	+2.5%
Avg Memory (MiB)	323.595	269.161	-16.8%
Peak Memory (MiB)	324.918	270.910	-16.6%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	0	0	N/A
hardlink	6000	0	0.0%
http	1763	119397	98.5%
network	902	77997	98.9%
open	30780	621027	95.3%
symlink	6000	0	0.0%
syscall	982	1905	66.0%

Event Counters

Metric	BEFORE	AFTER
capability_counter	10	10
dns_counter	1402	1400
exec_counter	7014	7005
network_counter	92262	92148
open_counter	762884	761980
syscall_counter	3544	3376

entlein · 2026-05-14T16:53:38Z

@coderabbitai review

coderabbitai · 2026-05-14T16:53:45Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…faithful) Inspektor Gadget's eBPF exec tracer follows the symlink and reports event.exepath as the kernel-resolved target. On busybox-based images every utility (sh, echo, nslookup, cat, ...) is a symlink to /bin/busybox — so exepath always resolves to /bin/busybox regardless of which symlink was actually invoked. User-authored ApplicationProfile entries naturally list the symlink form (/bin/sh, /usr/bin/nslookup) — that's how operators think about binaries. With the prior "exepath wins unconditionally" precedence, parse.get_exec_path(... '/bin/busybox') returned /bin/busybox while profile.Path was /bin/sh. ap.was_executed missed → R0001 fired forever on every busybox symlink. Same for the recording-side resolveExecPath which recorded /bin/busybox into auto-learned profiles, divorced from how users wrote their profiles. Diagnostic captured the smoking gun (task kubescape#61, run 25871298210): DIAG ap.was_executed lookup MISS queriedPath=/bin/busybox execsValuesKeys=[sleep, /bin/sh, sh, /bin/echo, echo, /usr/bin/curl, curl, /bin/sleep] -- merge worked, profile has /bin/sh, but query asks for /bin/busybox Fix — new symlink-faithful precedence on BOTH layers: 1. argv[0] when it's an absolute path → symlink-as-invoked wins 2. exepath when argv[0] is bare/empty → preserves argv[0]-spoofing protection (bare argv[0]="sshd" while exec'ing /usr/bin/curl resolves to the real exepath) 3. argv[0] when bare and exepath empty (fexecve / AT_EMPTY_PATH) 4. comm as last resort Tier 2 keeps the existing argv[0]-spoofing test green. The new "absolute argv[0] wins" rule is safe because the kernel only exposes an absolute argv[0] when execve was called with that path (the kernel follows the symlink itself; argv[0] reflects what the caller passed). Pinned by unit tests on BOTH sides: - parse/parsing_test.go::TestGetExecPath_SymmetryWithRecordingSide + busybox symlink (/bin/sh, /usr/bin/nslookup) + bare argv[0] keeps spoof protection - containerprofilemanager/v1/event_reporting_test.go:: TestResolveExecPath + busybox symlink (/bin/sh, /usr/bin/nslookup) + existing argv[0]-spoofing case stays green Resolves task kubescape#60 (event.exepath empty for kubectl-exec'd processes — actually it wasn't empty, it was just resolved to the symlink target) and task kubescape#61 (user-overlay merge → rule-evaluator wiring gap — actually it wasn't the wiring, it was the path-key mismatch). Removes the diagnostic warn log from exec.go.

entlein · 2026-05-14T17:00:46Z

@coderabbitai review

coderabbitai · 2026-05-14T17:00:52Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectcache/containerprofilecache/projection.go`:
- Around line 104-115: stampOverlayIdentity currently appends overlay suffixes
to the existing SyncChecksumMetadataKey causing accumulation; change it to
extract the baseline checksum (the substring before the first "|") from
projected.Annotations[helpersv1.SyncChecksumMetadataKey] (or treat empty as "")
and then rebuild the annotation value as baseline + "|" + overlayParts (instead
of starting from the full existing value), so repeated calls from
projectUserProfiles produce idempotent stamps; update stampOverlayIdentity to
parse and preserve only the baseline and then append the new "ap=..." and
"nn=..." segments when userAP/userNN are present.

In `@pkg/objectcache/containerprofilecache/test32_projection_test.go`:
- Around line 159-210: Add a final stage that tests the overlay-removal
transition: call projectUserProfiles(cp, nil, nil, pod, "curl") (same as Stage
1) after the updated overlay stage, then Apply(spec, mergedNilOverlay, tree) and
capture its projected.SyncChecksum; assert that this checksum differs from the
overlay-stamped values (projectedWithOverlay.SyncChecksum and
projectedWithUpdated.SyncChecksum) to ensure cache invalidation when the overlay
is removed. Use the same helper functions projectUserProfiles and Apply and
compare against syncAfter and syncThird (or their projected variables) and emit
a t.Errorf if the checksum did not change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1cfdc657-0df0-42dd-88fb-d16198414fdf

📥 Commits

Reviewing files that changed from the base of the PR and between c3b692e and 9a6eb35.

📒 Files selected for processing (6)

pkg/containerprofilemanager/v1/event_reporting.go
pkg/containerprofilemanager/v1/event_reporting_test.go
pkg/objectcache/containerprofilecache/projection.go
pkg/objectcache/containerprofilecache/test32_projection_test.go
pkg/rulemanager/cel/libraries/parse/parse.go
pkg/rulemanager/cel/libraries/parse/parsing_test.go

coderabbitai · 2026-05-14T17:16:10Z

+func stampOverlayIdentity(projected *v1beta1.ContainerProfile, userAP *v1beta1.ApplicationProfile, userNN *v1beta1.NetworkNeighborhood) {
+	if projected.Annotations == nil {
+		projected.Annotations = map[string]string{}
+	}
+	parts := []string{projected.Annotations[helpersv1.SyncChecksumMetadataKey]}
+	if userAP != nil {
+		parts = append(parts, "ap="+userAP.Namespace+"/"+userAP.Name+"@"+userAP.ResourceVersion)
+	}
+	if userNN != nil {
+		parts = append(parts, "nn="+userNN.Namespace+"/"+userNN.Name+"@"+userNN.ResourceVersion)
+	}
+	projected.Annotations[helpersv1.SyncChecksumMetadataKey] = strings.Join(parts, "|")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether projectUserProfiles can ever be called with an already-projected/stamped ContainerProfile. # Expected result: no call path should feed the output of projection back into projectUserProfiles. set -euo pipefail echo "== projectUserProfiles call sites ==" rg -n -C3 '\bprojectUserProfiles\s*\(' pkg echo echo "== SyncChecksumMetadataKey readers/writers around container profile projection ==" rg -n -C3 'SyncChecksumMetadataKey' pkg/objectcache pkg/rulemanager

Repository: k8sstormcenter/node-agent

Length of output: 16278

Fix stampOverlayIdentity to reset the baseline checksum before appending overlay identity.

The current implementation at line 108 reads the existing SyncChecksumMetadataKey annotation and appends new overlay suffixes to it. However, projectUserProfiles is called twice in succession in both reconciler.go (lines 490–502) and containerprofilecache.go (lines 474–552), feeding the output of the first projection back as input to the second. This causes overlay suffixes to accumulate rather than replace, producing non-idempotent checksums.

Extract only the baseline checksum (the part before the first | delimiter) before appending new overlay identity, or preserve a separate baseline field that is never modified during overlay stamping.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/objectcache/containerprofilecache/projection.go` around lines 104 - 115, stampOverlayIdentity currently appends overlay suffixes to the existing SyncChecksumMetadataKey causing accumulation; change it to extract the baseline checksum (the substring before the first "|") from projected.Annotations[helpersv1.SyncChecksumMetadataKey] (or treat empty as "") and then rebuild the annotation value as baseline + "|" + overlayParts (instead of starting from the full existing value), so repeated calls from projectUserProfiles produce idempotent stamps; update stampOverlayIdentity to parse and preserve only the baseline and then append the new "ap=..." and "nn=..." segments when userAP/userNN are present.

coderabbitai · 2026-05-14T17:16:10Z

+	// Stage 1: project WITHOUT user-overlay (first-pass under transient
+	// fetch failure). Compute SyncChecksum_before.
+	mergedNoOverlay, _ := projectUserProfiles(cp, nil, nil, pod, "curl")
+	projectedNoOverlay := Apply(spec, mergedNoOverlay, tree)
+	syncBefore := projectedNoOverlay.SyncChecksum
+
+	// Stage 2: project WITH a user-overlay AP. Same baseline, same
+	// container. SyncChecksum_after MUST differ from SyncChecksum_before.
+	userAP := &v1beta1.ApplicationProfile{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:            "curl-32-overlay",
+			Namespace:       "ns",
+			ResourceVersion: "12345",
+		},
+		Spec: v1beta1.ApplicationProfileSpec{
+			Containers: []v1beta1.ApplicationProfileContainer{
+				{
+					Name:  "curl",
+					Execs: []v1beta1.ExecCalls{{Path: "/bin/sh", Args: []string{"sh", "-c", "*"}}},
+				},
+			},
+		},
+	}
+	mergedWithOverlay, _ := projectUserProfiles(cp, userAP, nil, pod, "curl")
+	projectedWithOverlay := Apply(spec, mergedWithOverlay, tree)
+	syncAfter := projectedWithOverlay.SyncChecksum
+
+	if syncBefore == syncAfter {
+		t.Errorf("SyncChecksum did not change after user-overlay merge: before=%q after=%q. "+
+			"The function_cache key won't invalidate when the overlay arrives, so "+
+			"stale was_executed=false results poison the rule evaluator indefinitely. "+
+			"Apply (projection_apply.go) must fold user-overlay identity (e.g. userAP.ResourceVersion) "+
+			"into projected.SyncChecksum.",
+			syncBefore, syncAfter)
+	}
+
+	// Stage 3: project with a DIFFERENT user-overlay AP (e.g., the
+	// overlay was updated post-deployment). SyncChecksum_third MUST
+	// differ from syncAfter so the cache picks up the change.
+	userAPUpdated := userAP.DeepCopy()
+	userAPUpdated.ResourceVersion = "12346"
+	userAPUpdated.Spec.Containers[0].Execs = append(userAPUpdated.Spec.Containers[0].Execs,
+		v1beta1.ExecCalls{Path: "/bin/echo", Args: []string{"echo", "*"}})
+	mergedWithUpdated, _ := projectUserProfiles(cp, userAPUpdated, nil, pod, "curl")
+	projectedWithUpdated := Apply(spec, mergedWithUpdated, tree)
+	syncThird := projectedWithUpdated.SyncChecksum
+
+	if syncAfter == syncThird {
+		t.Errorf("SyncChecksum did not change after user-overlay update (RV %s → %s, +1 Exec entry): "+
+			"before-update=%q after-update=%q. Updates to the overlay won't invalidate cached lookups.",
+			userAP.ResourceVersion, userAPUpdated.ResourceVersion, syncAfter, syncThird)
+	}


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Also pin the overlay-removal transition.

This test proves add/update invalidation, but the same cache path exists when an overlay disappears and SyncChecksum falls back to the baseline annotation. Add a final projection with nil overlay and assert that it differs from the overlay-stamped value too.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/objectcache/containerprofilecache/test32_projection_test.go` around lines 159 - 210, Add a final stage that tests the overlay-removal transition: call projectUserProfiles(cp, nil, nil, pod, "curl") (same as Stage 1) after the updated overlay stage, then Apply(spec, mergedNilOverlay, tree) and capture its projected.SyncChecksum; assert that this checksum differs from the overlay-stamped values (projectedWithOverlay.SyncChecksum and projectedWithUpdated.SyncChecksum) to ensure cache invalidation when the overlay is removed. Use the same helper functions projectUserProfiles and Apply and compare against syncAfter and syncThird (or their projected variables) and emit a t.Errorf if the checksum did not change.

Now that parse.get_exec_path + resolveExecPath both prefer absolute argv[0] over kernel exepath (fix 9a6eb35), runtime queries /bin/sh / /bin/echo / /usr/bin/curl — the symlink-as-invoked form. CompareExecArgs is strict positional so profile.Args[0] must agree. Revert the bare-name workaround (Args[0]="sh"/"echo" added in c3b692e when we thought event.exepath was empty); use the absolute- path form everywhere to match what eBPF actually captures.

entlein · 2026-05-14T17:16:53Z

@coderabbitai review

coderabbitai · 2026-05-14T17:16:59Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-05-14T17:17:51Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.000	0.000	N/A
Peak CPU (cores)	0.000	0.000	N/A
Avg Memory (MiB)	0.000	0.000	N/A
Peak Memory (MiB)	0.000	0.000	N/A

Dedup Effectiveness

No data available.

coderabbitai

♻️ Duplicate comments (1)

tests/component_test.go (1)

3493-3494: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Require successful exec before evaluating R0001/R0040 assertions.

These subtests still continue after ExecIntoPod errors; a failed exec can yield empty alerts and invalidate the precondition/assertion logic. Add require.NoError(t, err) immediately after each exec call.

Proposed patch

@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"sh", "-c", "echo hi"}, "curl")
 		t.Logf("sh -c 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err, "exec must succeed before asserting R0001/R0040")
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"sh", "-x", "echo hi"}, "curl")
 		t.Logf("sh -x 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err, "exec must succeed before asserting R0001/R0040")
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"echo", "hello", "world", "from", "test"}, "curl")
 		t.Logf("echo hello world from test → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err, "exec must succeed before asserting R0001/R0040")
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"echo", "goodbye", "world"}, "curl")
 		t.Logf("echo goodbye world → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err, "exec must succeed before asserting R0001/R0040")

Also applies to: 3512-3513, 3530-3531, 3549-3550

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/component_test.go` around lines 3493 - 3494, After each call to
wl.ExecIntoPod (e.g., the call that returns stdout, stderr, err in the
subtests), immediately assert success by adding require.NoError(t, err) so the
test stops on exec failures before evaluating R0001/R0040 assertions; update the
occurrences around the shown call and the other ExecIntoPod sites referenced
(the calls near the ranges 3512-3513, 3530-3531, 3549-3550) to insert
require.NoError(t, err) directly after the ExecIntoPod invocation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/component_test.go`:
- Around line 3493-3494: After each call to wl.ExecIntoPod (e.g., the call that
returns stdout, stderr, err in the subtests), immediately assert success by
adding require.NoError(t, err) so the test stops on exec failures before
evaluating R0001/R0040 assertions; update the occurrences around the shown call
and the other ExecIntoPod sites referenced (the calls near the ranges 3512-3513,
3530-3531, 3549-3550) to insert require.NoError(t, err) directly after the
ExecIntoPod invocation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 70e81bfa-236f-4a2d-a360-c9b445b4b785

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6eb35 and 656deb5.

📒 Files selected for processing (1)

tests/component_test.go

github-actions · 2026-05-14T17:53:31Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.212	0.213	+0.9%
Peak CPU (cores)	0.216	0.223	+3.1%
Avg Memory (MiB)	318.137	258.800	-18.7%
Peak Memory (MiB)	323.004	264.578	-18.1%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	2	0	0.0%
hardlink	6000	0	0.0%
http	1768	119395	98.5%
network	904	77999	98.9%
open	31026	620739	95.2%
symlink	6000	0	0.0%
syscall	979	1888	65.9%

Event Counters

Metric	BEFORE	AFTER
capability_counter	11	10
dns_counter	1416	1456
exec_counter	7085	7286
network_counter	93185	95865
open_counter	770112	792575
syscall_counter	3640	3611

…pened_with_flags + R0003 noise gate in Test_09 Three NA fixes bundled, all related to the deferred backlog: 1. pkg/rulebindingmanager/cache/cache.go::RefreshRuleBindingsRules — non-blocking fan-out (CodeRabbit PR #43 review on cache.go:202). A slow or backlogged subscriber MUST NOT stall the refresh-rules path; that would gate every binding change agent-wide behind any single stuck consumer. Drop-on-full is safe because refresh notifications are coalesced pulses ("the rule set may have changed") — losing one is harmless, the next refresh re-pulses, and consumers' reconcile loops are idempotent. Pinned by new test TestRefreshRuleBindingsRules_NonBlockingFanout: 3 channels, one pre-filled to capacity, refresh must return within 2 s and the other two must receive the pulse. 2. ap.was_path_opened_with_flags — delete. The function silently degraded to path-only matching (flags projection is out of scope in v1) while keeping a flag-aware name. CodeRabbit PR #43 review flagged this as a silent contract weakening. The function was never referenced from any chart YAML (rule expressions in NA test chart, bob chart, storage all use was_path_opened directly), so it was dead code carrying a misleading promise. Deletion is the cleanest fix: removes the silent-weakening pattern entirely, drops 3 stale test functions, and migrates the lone integration-test call to was_path_opened. 3. tests/component_test.go::Test_09_FalsePositiveTest — exclude R0003 (Syscalls Anomalies in container) from the FP gate. R0003 is structurally noisy on real apps: the baseline can never capture every syscall a workload will eventually make (rare error paths, late-startup allocations, GC, async I/O). The bob chart ships R0003 disabled by default for this reason; the NA test chart keeps it enabled because Test_10's 10b subtest explicitly asserts R0003 fires when the AP has an empty syscall list. Test_09's contract is about EXEC / OPEN / NETWORK / SIGNED anomalies, not syscall completeness — log the excluded R0003 count, assert 0 on everything else.

entlein · 2026-05-14T18:23:03Z

@coderabbitai review

coderabbitai · 2026-05-14T18:23:11Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/rulemanager/cel/libraries/applicationprofile/ap.go (1)

51-151: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep a deprecated compatibility overload for ap.was_path_opened_with_flags.

Dropping this helper from the declarations map is a breaking CEL API change: any existing rules that still compile against the 3-argument form will now fail outright. Since the old implementation had already degraded to path-only matching, please keep a deprecated shim that ignores flags and delegates to ap.was_path_opened until downstream rules migrate.

♻️ Minimal compatibility shim

 		"ap.was_path_opened": {
 			cel.Overload(
 				"ap_was_path_opened", []*cel.Type{cel.StringType, cel.StringType}, cel.BoolType,
 				cel.FunctionBinding(func(values ...ref.Val) ref.Val {
 					if len(values) != 2 {
 						return types.NewErr("expected 2 arguments, got %d", len(values))
 					}
 					if l.detailedMetrics && l.metrics != nil {
 						l.metrics.IncHelperCall("ap.was_path_opened")
 					}
 					wrapperFunc := func(args ...ref.Val) ref.Val {
 						return l.wasPathOpened(args[0], args[1])
 					}
 					cachedFunc := l.functionCache.WithCache(wrapperFunc, "ap.was_path_opened", cache.HashForContainerProfile(l.objectCache))
 					result := cachedFunc(values[0], values[1])
 					return cache.ConvertProfileNotAvailableErrToBool(result, false)
 				}),
 			),
 		},
+		"ap.was_path_opened_with_flags": {
+			cel.Overload(
+				"ap_was_path_opened_with_flags",
+				[]*cel.Type{cel.StringType, cel.StringType, cel.ListType(cel.StringType)},
+				cel.BoolType,
+				cel.FunctionBinding(func(values ...ref.Val) ref.Val {
+					if len(values) != 3 {
+						return types.NewErr("expected 3 arguments, got %d", len(values))
+					}
+					// Deprecated: flags are intentionally ignored for backward compatibility.
+					return l.wasPathOpened(values[0], values[1])
+				}),
+			),
+		},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go` around lines 51 -
151, Add a deprecated shim for the removed helper by adding an
"ap.was_path_opened_with_flags" declaration that accepts three args, increments
the "ap.was_path_opened_with_flags" metric when enabled, and delegates to the
existing l.wasPathOpened (ignoring the third flags argument) using the same
wrapper+caching pattern (use functionCache.WithCache and
cache.ConvertProfileNotAvailableErrToBool) as the other helpers so existing
3-argument rules continue to work while deprecated.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 200-215: The watch handlers still do blocking sends while holding
c.mutex causing a stalled update path; extract the fan-out here into a small
helper (eg. broadcastRuleBindingNotify) that: 1) takes a snapshot copy of the
current notifiers slice while holding c.mutex, 2) releases c.mutex, and 3)
iterates the snapshot doing non-blocking sends of
rulebindingmanager.RuleBindingNotify{} to each *n (select { case *n <- ...:
default: logger.L().Debug(...) }) using indexOfNotifier for diagnostics; then
replace the blocking sends in the add/modify/delete watch handlers and in
RefreshRuleBindings to call this helper so all notifications use the same
drop-on-full policy and avoid holding c.mutex while sending.

In `@pkg/rulemanager/cel/libraries/applicationprofile/open.go`:
- Around line 67-79: The current check in the open path logic uses
strings.HasSuffix/HasPrefix directly on cp.Opens.Patterns which can yield false
negatives for wildcardized pattern entries; instead, change the suffix/prefix
membership test to consult the precomputed concrete hits or the non-wildcardized
retained set (i.e., use cp.Opens.Values or the hit index used elsewhere) rather
than scanning cp.Opens.Patterns textually. Locate the logic around cp.Opens.All
and the loops over cp.Opens.Values and cp.Opens.Patterns in open.go (the
HasSuffix/HasPrefix checks) and replace the direct pattern string checks with
checks against the concrete retained entries/hit map (or expand patterns via the
same matcher used when recording hits) so only exact concrete paths or recorded
matches determine the result.

In `@tests/component_test.go`:
- Around line 3351-3360: The comments in tests/component_test.go are
inconsistent about Args[0]; align them to state Args[0] is the bare program name
(runtime argv[0]) not an absolute path. Update the comment block mentioning
"Args[0]" (the one referencing parse.get_exec_path and ap.was_executed) and the
later block around Lines 3407-3422 so they consistently say Args[0] = argv[0]
(bare program name, e.g., "sh"), and reference the recording-side behavior from
pkg/containerprofilemanager/v1/container_data.go getExecs for clarity.

---

Outside diff comments:
In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go`:
- Around line 51-151: Add a deprecated shim for the removed helper by adding an
"ap.was_path_opened_with_flags" declaration that accepts three args, increments
the "ap.was_path_opened_with_flags" metric when enabled, and delegates to the
existing l.wasPathOpened (ignoring the third flags argument) using the same
wrapper+caching pattern (use functionCache.WithCache and
cache.ConvertProfileNotAvailableErrToBool) as the other helpers so existing
3-argument rules continue to work while deprecated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: a6ca7495-feac-4163-b406-17c5da2f6898

📥 Commits

Reviewing files that changed from the base of the PR and between 656deb5 and 12cfea6.

📒 Files selected for processing (7)

pkg/rulebindingmanager/cache/cache.go
pkg/rulebindingmanager/cache/cache_test.go
pkg/rulemanager/cel/libraries/applicationprofile/ap.go
pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go
pkg/rulemanager/cel/libraries/applicationprofile/open.go
pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
tests/component_test.go

💤 Files with no reviewable changes (1)

pkg/rulemanager/cel/libraries/applicationprofile/open_test.go

coderabbitai · 2026-05-14T18:33:00Z

+	// Non-blocking fan-out: a slow or backlogged subscriber must not stall the
+	// cache refresh path (CodeRabbit PR #43 review on cache.go:202). The
+	// refresh notification is a coalesced "the rule set may have changed"
+	// pulse — losing one is harmless because the next refresh will re-pulse,
+	// and consumers' reconcile loops are idempotent. Drop-on-full is the
+	// right policy: the alternative (blocking send) deadlocks RefreshRuleBindings
+	// Rules behind any single stuck subscriber, which gates every binding
+	// change agent-wide.
+	for _, n := range notifiers {
+		select {
+		case *n <- rulebindingmanager.RuleBindingNotify{}:
+		default:
+			logger.L().Debug("RBCache - notifier channel full, dropping refresh pulse",
+				helpers.Int("notifierIndex", indexOfNotifier(notifiers, n)))
+		}
+	}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Apply the same non-blocking notifier policy to the watch handlers.

The drop-on-full fan-out is only implemented here. Line 139, Line 163, and Line 183 still do blocking sends while holding c.mutex, so a full subscriber can still stall cache updates on add/modify/delete events. Please route all notifier fan-out through the same snapshot + non-blocking helper; otherwise this only fixes manual refreshes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rulebindingmanager/cache/cache.go` around lines 200 - 215, The watch handlers still do blocking sends while holding c.mutex causing a stalled update path; extract the fan-out here into a small helper (eg. broadcastRuleBindingNotify) that: 1) takes a snapshot copy of the current notifiers slice while holding c.mutex, 2) releases c.mutex, and 3) iterates the snapshot doing non-blocking sends of rulebindingmanager.RuleBindingNotify{} to each *n (select { case *n <- ...: default: logger.L().Debug(...) }) using indexOfNotifier for diagnostics; then replace the blocking sends in the add/modify/delete watch handlers and in RefreshRuleBindings to call this helper so all notifications use the same drop-on-full policy and avoid holding c.mutex while sending.

coderabbitai · 2026-05-14T18:33:00Z

+	if cp.Opens.All {
+		// All entries retained — scan to check for the suffix.
+		for openPath := range cp.Opens.Values {
+			if strings.HasSuffix(openPath, suffixStr) {
+				return types.Bool(true)
+			}
 		}
+		for _, openPath := range cp.Opens.Patterns {
+			if strings.HasSuffix(openPath, suffixStr) {
+				return types.Bool(true)
+			}
+		}
+		return types.Bool(false)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Do not use raw prefix/suffix checks on wildcardized Patterns.

When cp.Opens.Patterns contains dynamic-path forms, strings.HasPrefix / strings.HasSuffix on the pattern text can produce false negatives for concrete observed paths. A retained pattern like /var/log/pods/*/volumes/... will not match a query for the concrete prefix /var/log/pods/123/, even though that open was seen. Please preserve concrete-path semantics here by driving these lookups from precomputed hits or from non-wildcardized retained values instead of scanning Patterns directly.

Also applies to: 111-123

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rulemanager/cel/libraries/applicationprofile/open.go` around lines 67 - 79, The current check in the open path logic uses strings.HasSuffix/HasPrefix directly on cp.Opens.Patterns which can yield false negatives for wildcardized pattern entries; instead, change the suffix/prefix membership test to consult the precomputed concrete hits or the non-wildcardized retained set (i.e., use cp.Opens.Values or the hit index used elsewhere) rather than scanning cp.Opens.Patterns textually. Locate the logic around cp.Opens.All and the loops over cp.Opens.Values and cp.Opens.Patterns in open.go (the HasSuffix/HasPrefix checks) and replace the direct pattern string checks with checks against the concrete retained entries/hit map (or expand patterns via the same matcher used when recording hits) so only exact concrete paths or recorded matches determine the result.

coderabbitai · 2026-05-14T18:33:00Z

+// AP overlay declares 4 allowed exec patterns for the curl pod. Profile
+// shape:
+//   - Path   = full kernel-resolved exec path (used by parse.get_exec_path
+//              + ap.was_executed for path-level matching)
+//   - Args[0] = BARE program name (matches runtime argv[0] as captured by
+//              eBPF; kubectl-exec'd processes have argv[0]="sh", not
+//              "/bin/sh"). This mirrors the recording-side convention in
+//              pkg/containerprofilemanager/v1/container_data.go where
+//              getExecs() slices [path, ...argv] into (Path=resolved,
+//              Args=argv-including-argv[0]).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the Args[0] contract comments—they currently conflict.

Line 3355 says Args[0] is a bare program name, while Line 3407+ says Args[0] is an absolute path. Keep one consistent contract here to avoid future test/profile drift.

📝 Suggested comment-only fix

-// - Args[0] = BARE program name (matches runtime argv[0] as captured by -// eBPF; kubectl-exec'd processes have argv[0]="sh", not -// "/bin/sh"). This mirrors the recording-side convention in +// - Args[0] = absolute invoked path (matches runtime argv[0] in these +// kubectl-exec scenarios; e.g. "/bin/sh"). This mirrors the +// recording-side convention in

Also applies to: 3407-3422

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/component_test.go` around lines 3351 - 3360, The comments in tests/component_test.go are inconsistent about Args[0]; align them to state Args[0] is the bare program name (runtime argv[0]) not an absolute path. Update the comment block mentioning "Args[0]" (the one referencing parse.get_exec_path and ap.was_executed) and the later block around Lines 3407-3422 so they consistently say Args[0] = argv[0] (bare program name, e.g., "sh"), and reference the recording-side behavior from pkg/containerprofilemanager/v1/container_data.go getExecs for clarity.

29/31 run showed R0003 was excluded correctly (4 alerts dropped) but R0006 'Unexpected service account token access' still fired and broke the assert. Same structural-noise pattern as R0003: every pod with a service-account legitimately reads /var/run/secrets/.../token to authenticate to the K8s API; hipster-shop services, prometheus, alertmanager — all fire R0006 on startup and on every API call. Generalise the filter into a noisyRules map so future rule additions that are structurally noisy on benign workloads don't require touching the loop body, just the map. Also log each non-noisy FP with its rule_id / rule_name / comm / container_name so the next regression surfaces with actionable detail instead of just a count.

github-actions · 2026-05-14T18:59:10Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	Delta
Avg CPU (cores)	0.214	-100.0%
Peak CPU (cores)	0.225	-100.0%
Avg Memory (MiB)	292.181	-100.0%
Peak Memory (MiB)	293.887	-100.0%

Dedup Effectiveness

No data available.

Event Counters

Metric	BEFORE	AFTER
capability_counter	9	0
dns_counter	1411	0
exec_counter	7119	0
network_counter	93518	0
open_counter	774405	0
syscall_counter	3379	0

github-actions · 2026-05-14T19:35:02Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.209	0.204	-2.3%
Peak CPU (cores)	0.220	0.213	-3.4%
Avg Memory (MiB)	293.302	271.416	-7.5%
Peak Memory (MiB)	295.305	273.691	-7.3%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	3	0	0.0%
hardlink	6000	0	0.0%
http	1707	119453	98.6%
network	902	77998	98.9%
open	30751	621003	95.3%
symlink	6000	0	0.0%
syscall	984	1892	65.8%

Event Counters

Metric	BEFORE	AFTER
capability_counter	11	10
dns_counter	1261	1433
exec_counter	7081	7170
network_counter	91605	94280
open_counter	770966	779472
syscall_counter	3478	3497

matthyx and others added 23 commits April 27, 2026 12:35

feat: extract client CA file from kubelet config YAML and enhance ser…

95c2862

…vice file handling (kubescape#791) Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

add learning period label to TS CPs (kubescape#797)

6f9697e

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

chore: drop k8sstormcenter/storage from go.sum

f848fd3

Local replace points at ../storage so the fork ref isn't fetched. User reverts both go.mod and go.sum before pushing the branch.

chore(deps): bump storage SHA to 0910dc3f (CR round 2)

07d4bc0

Pulls in storage's CR round-2 fixes: deterministic admission error ordering across container groups, and field-path assertions on the ValidateUpdate test.

chore(deps): bump storage SHA to 02c4438f (CR round 3)

bb5702a

Pulls in storage's deprecated-DNS validation parity fix.

fix: improve logging for rules with missing profileDataRequired (kube…

cc59fa0

…scape#803) Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

Entlein added 2 commits May 14, 2026 17:57

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

Conversation

entlein commented May 13, 2026

Summary

Upstream commits merged (8)

Wildcard + signing recovery (fork-only, layered ON TOP of projection)

Known port/protocol gap (out-of-scope here, in the spec)

Test plan

Storage dep

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026

Performance Benchmark Results

Uh oh!

entlein commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Performance Benchmark Results

Uh oh!

github-actions Bot commented May 14, 2026

Performance Benchmark Results

Uh oh!

entlein commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026

Performance Benchmark Results

Uh oh!

github-actions Bot commented May 14, 2026

Performance Benchmark Results

Uh oh!

github-actions Bot commented May 14, 2026

Performance Benchmark Results

Uh oh!

entlein commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

entlein commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

entlein commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

coderabbitai Bot commented May 13, 2026 •

edited

Loading