Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
86397a8
Changes needed to get clickhouse e2e test working with external click…
ddelnano Apr 13, 2026
f590005
Implement parquet export format
ddelnano Apr 16, 2026
3510794
Allow prometheus recorders to specifiy different kubeconfig or kubeco…
ddelnano Apr 16, 2026
5a8fb65
Fix parquet file overflow bug
ddelnano Apr 16, 2026
17188d5
Add duck db wasm visualization file
ddelnano Apr 16, 2026
63f7d5f
Temporary changes to make load testing easier
ddelnano Apr 17, 2026
839af02
Add clickhouse perf_tool suite, ability to query cross kubeconfig/kub…
ddelnano Apr 20, 2026
06a8d3a
Ensure px delete works with external k8s ApiService
ddelnano Apr 20, 2026
1f9c121
Add github workflow for perf clickhouse suite
ddelnano Apr 22, 2026
5ecab7c
Ignore non alphabetic characters in the service account json
ddelnano Apr 22, 2026
5112a10
Add tailscale debugging info for perf workflow
ddelnano Apr 22, 2026
bb80ebb
Initial sovereign_soc suite, which segfaults kelvin on first run
ddelnano Apr 22, 2026
f1302fd
Fix segfault issues, but fails with missing alerts clickhouse table
ddelnano Apr 22, 2026
cf29e2b
Add --skaffold_stderr_file to perf_tool to ease github workflow debug…
ddelnano Apr 23, 2026
026e3eb
Add x86_64_sysroot in profile
ddelnano Apr 23, 2026
6dd6107
Don't use verbose logging
ddelnano Apr 23, 2026
267ea25
Remove verbosity flag that was missed
ddelnano Apr 24, 2026
5c0eb9f
fix protocol_loadtest build
ddelnano Apr 24, 2026
d9b9adc
Install the px cli
ddelnano Apr 24, 2026
78f2853
Use correct cloud
ddelnano Apr 24, 2026
eb1abb3
Reduce test time
ddelnano Apr 24, 2026
dfcf602
Get redis-attack experiment working
ddelnano Apr 25, 2026
1d6ad69
Add perf github action for soc attack
ddelnano Apr 25, 2026
7cf848f
Don't let cronjobs fail the build
ddelnano Apr 25, 2026
1790956
Only attempt job once
ddelnano Apr 25, 2026
8af6f8a
experiment with the adaptive feature
entlein Apr 20, 2026
756d88d
settings for lab as default
entlein Apr 21, 2026
09e28ba
not sure about the scheduler annotations, but the main.go now sets th…
entlein Apr 22, 2026
e2e124b
address linting issues 1
entlein Apr 23, 2026
b7b0389
pinning trivvy to higher version
entlein Apr 23, 2026
4868412
linting part 2
entlein Apr 23, 2026
e89641d
linting part 3
entlein Apr 23, 2026
be963f5
linting part 4
entlein Apr 23, 2026
c293d90
Fix and modernize release workflows, complete ghcr.io migration, and …
ddelnano Apr 27, 2026
689ce7b
redesigning the adaptive write
entlein May 7, 2026
9ce8730
addressing the rabbit;
entlein May 8, 2026
487be4a
adaptive_export/trigger: dedupe at watermark boundary
May 8, 2026
9e8f74d
adaptive_export/trigger: validate identifiers + cover dedup with a test
May 8, 2026
546e03d
adaptive_export/trigger: stricter Endpoint validation, streaming pars…
May 8, 2026
79c60c1
adaptive_export/cmd: ADAPTIVE_SKIP_APPLY env to opt out of in-process…
May 8, 2026
996b2cb
adaptive_export: fix event_time unit + preset-script bootstrap
May 8, 2026
4243bdf
adaptive_export/cmd: built-in preset scripts fallback
May 8, 2026
428a2aa
adaptive_export/cmd: add internal/script bazel dep for builtin presets
May 8, 2026
c731d4d
adaptive_export: parse CH UInt64 wire format + diagnostic logs on pre…
May 8, 2026
7a88b4f
adaptive_export/cmd: log cluster + preset script names on install
May 8, 2026
950a2c5
adaptive_export: ignore cloud presets, install builtins, purge stale
May 8, 2026
d284491
adaptive_export: rev-1 push path (operator queries pixie + writes CH)
May 8, 2026
7e4b786
adaptive_export/cmd: skip dotted-name tables from push list (PxL limi…
May 8, 2026
b8a90ca
adaptive_export/controller: instrument pushPixieRows + per-query timeout
May 8, 2026
a79b373
adaptive_export/pxl: filter pod by namespaced key (px.upid_to_pod_nam…
May 9, 2026
59f4d68
adaptive_export/pixieapi: direct-mode JWT path bypassing cloud passth…
May 9, 2026
98ac1f0
addressing the rabbit2
entlein May 9, 2026
90c6858
adaptive_export/controller: periodic re-fan-out for full window coverage
May 9, 2026
e4329d1
addressing the rabbit3
entlein May 9, 2026
9f91360
addressing the rabbit4
entlein May 9, 2026
bb11514
addressing the rabbit5
entlein May 9, 2026
feb3a03
addressing the rabbit6
entlein May 9, 2026
e84cbac
addressing the rabbit7
entlein May 9, 2026
b599e77
addressing the rabbit8
entlein May 9, 2026
b386ce8
addressing the rabbit9
entlein May 9, 2026
9b74bc7
addressing the rabbit10
entlein May 9, 2026
833c5e5
fix perf soc eval test
entlein May 14, 2026
3b7bcab
perf soc workflow: set SOC_VIZIER_EXISTING to bind to running Vizier
May 14, 2026
02df73d
perf_tool/px deploy: diagnose + fix SetClusterID for existing-Vizier …
May 14, 2026
1d6a93e
adding load test yamls
entlein May 14, 2026
8083eeb
Merge branch 'entlein/adaptive-write-perf' of https://github.com/k8ss…
entlein May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions .github/workflows/perf_clickhouse.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
name: perf-eval-clickhouse
on:
workflow_dispatch:
inputs:
ref:
description: 'Branch or commit'
required: false
type: string
tags:
description: 'Tags (comma separated)'
required: false
type: string
permissions:
contents: read
packages: write
jobs:
get-dev-image-with-extras:
uses: ./.github/workflows/get_image.yaml
with:
image-base-name: "dev_image_with_extras"
ref: ${{ inputs.ref }}

clickhouse-export-perf:
name: ClickHouse export perf eval
needs: get-dev-image-with-extras
runs-on: oracle-vm-16cpu-64gb-x86-64
container:
image: ${{ needs.get-dev-image-with-extras.outputs.image-with-tag }}
options: --cap-add=NET_ADMIN --device=/dev/net/tun
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.ref }}
fetch-depth: 0
- name: Add pwd to git safe dir
run: git config --global --add safe.directory `pwd`
- id: get-commit-sha
run: echo "commit-sha=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

# TODO(ddelnano): swap TAILSCALE_AUTH_KEY for an OAuth client once one is
# provisioned in the k8sstormcenter tailnet. Use
# `tailscale/github-action@v2` with `oauth-client-id` and `oauth-secret`
# inputs (`TS_OAUTH_CLIENT_ID` / `TS_OAUTH_CLIENT_SECRET` secrets) so
# credentials rotate automatically instead of expiring on a fixed cadence.
- name: Start Tailscale sidecar
env:
TS_AUTHKEY: ${{ secrets.TAILSCALE_AUTH_KEY }}
run: |
curl -fsSL https://tailscale.com/install.sh | sh
mkdir -p /var/run/tailscale /var/lib/tailscale
tailscaled \
--socket=/var/run/tailscale/tailscaled.sock \
--state=/var/lib/tailscale/tailscaled.state &
until tailscale status --json >/dev/null 2>&1; do sleep 1; done
tailscale up \
--authkey="${TS_AUTHKEY}" \
--accept-routes \
--hostname="pixie-perf-ci-${GITHUB_RUN_ID}"

- name: Write kubeconfig
env:
KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}
run: |
mkdir -p "${RUNNER_TEMP}"
echo "${KUBECONFIG_B64}" | base64 -d > "${RUNNER_TEMP}/kubeconfig"
chmod 600 "${RUNNER_TEMP}/kubeconfig"

# Fail fast if Tailscale can't reach the cluster API, before the 2+ minute
# bazel/skaffold build wastes time.
- name: Tailscale connectivity probe
env:
KUBECONFIG: ${{ runner.temp }}/kubeconfig
run: |
tailscale status
tailscale netcheck
api_host="$(kubectl --kubeconfig="$KUBECONFIG" config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed -E 's|https?://||; s|/.*||')"
api_ip="${api_host%%:*}"
api_port="${api_host##*:}"
echo "--- tailscale ping ${api_ip} ---"
tailscale ping --c 3 --until-direct=false "${api_ip}" || true
echo "--- tcp probe ${api_ip}:${api_port} ---"
timeout 5 bash -c "</dev/tcp/${api_ip}/${api_port}" \
&& echo "API port reachable" \
|| { echo "API port UNREACHABLE"; exit 1; }
echo "--- kubectl get nodes ---"
kubectl --kubeconfig="$KUBECONFIG" get nodes

- name: Use github bazel config
uses: ./.github/actions/bazelrc
with:
download_toplevel: 'true'
BB_API_KEY: ${{ secrets.BB_IO_API_KEY }}

# TODO(ddelnano): revert to `./.github/actions/gcloud_creds` once GCP_SA_KEY
# is re-uploaded with `base64 -w0`. The shared composite uses plain
# `base64 --decode` which rejects the wrapped (multi-line/CRLF) value
# currently stored in the secret.
- id: gcloud-creds
env:
SERVICE_ACCOUNT_KEY: ${{ secrets.GCP_SA_KEY }}
run: |
printf '%s' "$SERVICE_ACCOUNT_KEY" | base64 -di > /tmp/gcloud.json
chmod 600 /tmp/gcloud.json
echo "gcloud-creds=/tmp/gcloud.json" >> $GITHUB_OUTPUT
- name: Activate gcloud service account
env:
GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
run: |
service_account="$(jq -r '.client_email' "$GOOGLE_APPLICATION_CREDENTIALS")"
gcloud auth activate-service-account "${service_account}" --key-file="$GOOGLE_APPLICATION_CREDENTIALS"
gcloud auth configure-docker

- name: Log in to GHCR
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: echo "${GH_TOKEN}" | docker login ghcr.io -u "${{ github.actor }}" --password-stdin

- name: Build and install px CLI
run: |
bazel build --config=x86_64_sysroot //src/pixie_cli:px
install -m 0755 bazel-bin/src/pixie_cli/px_/px /usr/local/bin/px
px version

- name: Run clickhouse-export perf
env:
PX_API_KEY: ${{ secrets.PX_API_KEY }}
GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
run: |
bazel run //src/e2e_test/perf_tool:perf_tool -- run \
--api_key="${PX_API_KEY}" \
--cloud_addr=pixie.austrianopencloudcommunity.org:443 \
--commit_sha="${{ steps.get-commit-sha.outputs.commit-sha }}" \
--experiment_name=clickhouse-export \
--suite=clickhouse-exec \
--use_local_cluster \
--export_backend=parquet-gcs \
--gcs_bucket=k8sstormcenter-soc-perf \
--container_repo=ghcr.io/k8sstormcenter \
--prom_recorder_override 'clickhouse-operator=:k8ss-forensic' \
--tags "${{ inputs.tags }}"

- name: Upload skaffold stderr log
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: skaffold-stderr-${{ github.run_id }}-${{ github.run_attempt }}
path: ${{ runner.temp }}/skaffold-stderr.log
if-no-files-found: ignore

- name: Deactivate gcloud service account
if: always()
run: gcloud auth revoke || true

- name: Tailscale logout
if: always()
run: tailscale logout || true
159 changes: 159 additions & 0 deletions .github/workflows/perf_soc_attack.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
name: perf-eval-soc-attack
on:
workflow_dispatch:
inputs:
ref:
description: 'Branch or commit'
required: false
type: string
tags:
description: 'Tags (comma separated)'
required: false
type: string
permissions:
contents: read
packages: write
jobs:
get-dev-image-with-extras:
uses: ./.github/workflows/get_image.yaml
with:
image-base-name: "dev_image_with_extras"
ref: ${{ inputs.ref }}

soc-attack-perf:
name: Sovereign SOC redis-attack perf eval
needs: get-dev-image-with-extras
runs-on: oracle-vm-16cpu-64gb-x86-64
container:
image: ${{ needs.get-dev-image-with-extras.outputs.image-with-tag }}
options: --cap-add=NET_ADMIN --device=/dev/net/tun
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.ref }}
fetch-depth: 0
- name: Add pwd to git safe dir
run: git config --global --add safe.directory `pwd`
- id: get-commit-sha
run: echo "commit-sha=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

# TODO(ddelnano): swap TAILSCALE_AUTH_KEY for an OAuth client once one is
# provisioned in the k8sstormcenter tailnet. Use
# `tailscale/github-action@v2` with `oauth-client-id` and `oauth-secret`
# inputs (`TS_OAUTH_CLIENT_ID` / `TS_OAUTH_CLIENT_SECRET` secrets) so
# credentials rotate automatically instead of expiring on a fixed cadence.
- name: Start Tailscale sidecar
env:
TS_AUTHKEY: ${{ secrets.TAILSCALE_AUTH_KEY }}
run: |
curl -fsSL https://tailscale.com/install.sh | sh
mkdir -p /var/run/tailscale /var/lib/tailscale
tailscaled \
--socket=/var/run/tailscale/tailscaled.sock \
--state=/var/lib/tailscale/tailscaled.state &
until tailscale status --json >/dev/null 2>&1; do sleep 1; done
tailscale up \
--authkey="${TS_AUTHKEY}" \
--accept-routes \
--hostname="pixie-perf-ci-${GITHUB_RUN_ID}"

- name: Write kubeconfig
env:
KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}
run: |
mkdir -p "${RUNNER_TEMP}"
echo "${KUBECONFIG_B64}" | base64 -d > "${RUNNER_TEMP}/kubeconfig"
chmod 600 "${RUNNER_TEMP}/kubeconfig"

# Fail fast if Tailscale can't reach the cluster API, before the 2+ minute
# bazel/skaffold build wastes time.
- name: Tailscale connectivity probe
env:
KUBECONFIG: ${{ runner.temp }}/kubeconfig
run: |
tailscale status
tailscale netcheck
api_host="$(kubectl --kubeconfig="$KUBECONFIG" config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed -E 's|https?://||; s|/.*||')"
api_ip="${api_host%%:*}"
api_port="${api_host##*:}"
echo "--- tailscale ping ${api_ip} ---"
tailscale ping --c 3 --until-direct=false "${api_ip}" || true
echo "--- tcp probe ${api_ip}:${api_port} ---"
timeout 5 bash -c "</dev/tcp/${api_ip}/${api_port}" \
&& echo "API port reachable" \
|| { echo "API port UNREACHABLE"; exit 1; }
echo "--- kubectl get nodes ---"
kubectl --kubeconfig="$KUBECONFIG" get nodes

- name: Use github bazel config
uses: ./.github/actions/bazelrc
with:
download_toplevel: 'true'
BB_API_KEY: ${{ secrets.BB_IO_API_KEY }}

# TODO(ddelnano): revert to `./.github/actions/gcloud_creds` once GCP_SA_KEY
# is re-uploaded with `base64 -w0`. The shared composite uses plain
# `base64 --decode` which rejects the wrapped (multi-line/CRLF) value
# currently stored in the secret.
- id: gcloud-creds
env:
SERVICE_ACCOUNT_KEY: ${{ secrets.GCP_SA_KEY }}
run: |
printf '%s' "$SERVICE_ACCOUNT_KEY" | base64 -di > /tmp/gcloud.json
chmod 600 /tmp/gcloud.json
echo "gcloud-creds=/tmp/gcloud.json" >> $GITHUB_OUTPUT
- name: Activate gcloud service account
env:
GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
run: |
service_account="$(jq -r '.client_email' "$GOOGLE_APPLICATION_CREDENTIALS")"
gcloud auth activate-service-account "${service_account}" --key-file="$GOOGLE_APPLICATION_CREDENTIALS"
gcloud auth configure-docker

- name: Log in to GHCR
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: echo "${GH_TOKEN}" | docker login ghcr.io -u "${{ github.actor }}" --password-stdin

- name: Build and install px CLI
run: |
bazel build --config=x86_64_sysroot //src/pixie_cli:px
install -m 0755 bazel-bin/src/pixie_cli/px_/px /usr/local/bin/px
px version

# The sovereign-soc suite installs Kubescape + Vector on the experiment
# cluster as part of the run (see KubescapeVectorWorkload). The
# kubescape-operator chart is pre-rendered under
# src/e2e_test/perf_tool/pkg/suites/k8s/sovereign-soc/helm-rendered/
# and applied via PrerenderedDeploy, so no extra ./scripts step is needed.
#
# ClickHouse operator metrics are scraped on the forensic cluster via
# the prom_recorder_override; the kubescape node-agent prom recorder
# is intentionally NOT overridden — kubescape runs on the experiment
# cluster (where redis+bobctl drive traffic), so the recorder uses the
# default kubeconfig.
- name: Run sovereign-soc redis-attack perf
env:
PX_API_KEY: ${{ secrets.PX_API_KEY }}
GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
SOC_VIZIER_EXISTING: "1"
run: |
bazel run //src/e2e_test/perf_tool:perf_tool -- run \
--api_key="${PX_API_KEY}" \
--cloud_addr=pixie.austrianopencloudcommunity.org:443 \
--commit_sha="${{ steps.get-commit-sha.outputs.commit-sha }}" \
--experiment_name=redis-attack \
--suite=sovereign-soc \
--use_local_cluster \
--export_backend=parquet-gcs \
--gcs_bucket=k8sstormcenter-soc-perf \
--container_repo=ghcr.io/k8sstormcenter \
--prom_recorder_override 'clickhouse-operator=:k8ss-forensic' \
--max_retries=1 \
--tags "${{ inputs.tags }}"

- name: Tailscale logout
if: always()
run: tailscale logout || true
4 changes: 3 additions & 1 deletion .github/workflows/trivy_fs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ jobs:
security-events: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: aquasecurity/trivy-action@18f2510ee396bbf400402947b394f2dd8c87dbb0 # v0.29.0
# v0.36.0 released 2026-04-22 (post-incident). Internally SHA-pins
# setup-trivy@3fb12ec = Aqua's safe v0.2.6 per GHSA-69fq-xp46-6x23.
- uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
with:
scan-type: 'fs'
ignore-unfixed: true
Expand Down
48 changes: 48 additions & 0 deletions chained-sweep.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env bash
# chained-sweep.sh — wait for an in-flight perf-sweep to finish, then kick
# off a second (independent) sweep into a fresh /tmp/perf-sweep-<ts>/ dir
# with its own watcher. Use this when you want a clean before/after pair
# without having to be at the keyboard when the first one ends.
#
# Usage:
# ./chained-sweep.sh <first-sweep-dir>
# ./chained-sweep.sh /tmp/perf-sweep-20260514-114224
set -euo pipefail

FIRST="${1:?need path to first sweep dir}"
LOG=/tmp/chained-sweep.log
exec > >(tee -a "$LOG") 2>&1

echo "$(date -Is) waiting for first sweep to finish: $FIRST"
# perf-sweep.sh writes "sweep complete in N s — <dir>" as the last line
# of sweep.log when all multipliers landed.
while ! grep -q "sweep complete" "$FIRST/sweep.log" 2>/dev/null; do
sleep 30
done
echo "$(date -Is) first sweep finished"

# Kick off second sweep (perf-sweep.sh creates its own timestamped dir).
# Tag the sweep.log with a header so it's obvious in the watcher output
# that this is the "after" run.
echo "$(date -Is) launching second sweep"
/home/constanze/code/pixie/perf-sweep.sh > /tmp/perf-sweep-second.stdout 2>&1 &
SWEEP_PID=$!

# Give perf-sweep.sh a moment to create its dir + sweep.log.
sleep 8
NEW=$(ls -dt /tmp/perf-sweep-2*/ 2>/dev/null | head -1)
NEW="${NEW%/}"
if [[ -z "$NEW" || "$NEW" == "$FIRST" ]]; then
echo "$(date -Is) ERROR: second sweep dir not detected"
exit 1
fi
echo "$(date -Is) second sweep dir: $NEW"

# Watcher for the new sweep (auto-exits when its sweep.log shows complete).
setsid bash /home/constanze/code/pixie/render-sweep-watch.sh "$NEW" \
</dev/null > /tmp/render-watch-second.log 2>&1 &
disown
echo "$(date -Is) watcher launched for $NEW"

wait "$SWEEP_PID"
echo "$(date -Is) second sweep done"
8 changes: 4 additions & 4 deletions docker.properties
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
DOCKER_IMAGE_TAG=202512082352
LINTER_IMAGE_DIGEST=441fc5a65697dab0b38627d5afde9e38da6812f1a5b98732b224161c23238e73
DEV_IMAGE_DIGEST=cac2e8a1c3e70dde4e5089b2383b2e11cc022af467ee430c12416eb42066fbb7
DEV_IMAGE_WITH_EXTRAS_DIGEST=e84f82d62540e1ca72650f8f7c9c4fe0b32b64a33f04cf0b913b9961527c9e30
DOCKER_IMAGE_TAG=202604270358
LINTER_IMAGE_DIGEST=af984e837756bce44089d0f977146aee989b24a12884ba2366b4e6eaf19d9acb
DEV_IMAGE_DIGEST=e4aec14294cff907e7dc3c4835950a4e166e503d32cae082418971e7f70d86bc
DEV_IMAGE_WITH_EXTRAS_DIGEST=331a2391941c589d2b6536ae49794460b1097c482a45a11029d96a7d0d8d8030
Loading
Loading