Skip to content
This repository was archived by the owner on May 6, 2026. It is now read-only.
This repository was archived by the owner on May 6, 2026. It is now read-only.

Investigate and re-enable e2e test failures BPF related tests #164

@gauravkghildiyal

Description

@gauravkghildiyal

A new test added in #163 seems to be surfacing some test failures caused by these 3 BPF related tests:

dranet/tests/e2e.bats

Lines 156 to 228 in 27e2770

# Test case for validating ebpf attributes are exposed via resource slice.
@test "validate bpf filter attributes" {
docker cp "$BATS_TEST_DIRNAME"/dummy_bpf.o "$CLUSTER_NAME"-worker2:/dummy_bpf.o
docker exec "$CLUSTER_NAME"-worker2 bash -c "ip link add dummy5 type dummy"
docker exec "$CLUSTER_NAME"-worker2 bash -c "ip link set up dev dummy5"
docker exec "$CLUSTER_NAME"-worker2 bash -c "tc qdisc add dev dummy5 clsact"
docker exec "$CLUSTER_NAME"-worker2 bash -c "tc filter add dev dummy5 ingress bpf direct-action obj dummy_bpf.o sec classifier"
run docker exec "$CLUSTER_NAME"-worker2 bash -c "tc filter show dev dummy5 ingress"
assert_success
assert_output --partial "dummy_bpf.o:[classifier] direct-action"
for attempt in {1..4}; do
run kubectl get resourceslices --field-selector spec.nodeName="$CLUSTER_NAME"-worker2 -o jsonpath='{.items[0].spec.devices[?(@.name=="dummy5")].attributes.dra\.net\/ebpf.bool}'
if [ "$status" -eq 0 ] && [[ "$output" == "true" ]]; then
break
fi
if (( attempt < 4 )); then
sleep 5
fi
done
assert_success
assert_output "true"
# Validate bpfName attribute
run kubectl get resourceslices --field-selector spec.nodeName="$CLUSTER_NAME"-worker2 -o jsonpath='{.items[0].spec.devices[?(@.name=="dummy5")].attributes.dra\.net\/tcFilterNames.string}'
assert_success
assert_output "dummy_bpf.o:[classifier]"
}
# This reuses previous test
@test "validate tcx bpf filter attributes" {
docker cp "$BATS_TEST_DIRNAME"/dummy_bpf_tcx.o "$CLUSTER_NAME"-worker2:/dummy_bpf_tcx.o
docker exec "$CLUSTER_NAME"-worker2 bash -c "curl --connect-timeout 5 --retry 3 -L https://github.com/libbpf/bpftool/releases/download/v7.5.0/bpftool-v7.5.0-amd64.tar.gz | tar -xz"
docker exec "$CLUSTER_NAME"-worker2 bash -c "chmod +x bpftool"
docker exec "$CLUSTER_NAME"-worker2 bash -c "./bpftool prog load dummy_bpf_tcx.o /sys/fs/bpf/dummy_prog_tcx"
docker exec "$CLUSTER_NAME"-worker2 bash -c "./bpftool net attach tcx_ingress pinned /sys/fs/bpf/dummy_prog_tcx dev dummy5"
run docker exec "$CLUSTER_NAME"-worker2 bash -c "./bpftool net show dev dummy5"
assert_success
assert_output --partial "tcx/ingress handle_ingress prog_id"
# Wait for the interface to be discovered
sleep 5
# Validate bpf attribute is true
run kubectl get resourceslices --field-selector spec.nodeName="$CLUSTER_NAME"-worker2 -o jsonpath='{.items[0].spec.devices[?(@.name=="dummy5")].attributes.dra\.net\/ebpf.bool}'
assert_success
assert_output "true"
# Validate bpfName attribute
run kubectl get resourceslices --field-selector spec.nodeName="$CLUSTER_NAME"-worker2 -o jsonpath='{.items[0].spec.devices[?(@.name=="dummy5")].attributes.dra\.net\/tcxProgramNames.string}'
assert_success
assert_output "handle_ingress"
}
# This reuses previous test
@test "validate bpf programs are removed" {
kubectl apply -f "$BATS_TEST_DIRNAME"/../examples/deviceclass.yaml
kubectl apply -f "$BATS_TEST_DIRNAME"/../examples/resourceclaim_disable_ebpf.yaml
kubectl wait --for=condition=ready pod/pod-ebpf --timeout=300s
run kubectl exec pod-ebpf -- ash -c "curl --connect-timeout 5 --retry 3 -L https://github.com/libbpf/bpftool/releases/download/v7.5.0/bpftool-v7.5.0-amd64.tar.gz | tar -xz && chmod +x bpftool"
assert_success
run kubectl exec pod-ebpf -- ash -c "./bpftool net show dev dummy5"
assert_success
refute_output --partial "tcx/ingress handle_ingress prog_id"
refute_output --partial "dummy_bpf.o:[classifier]"
kubectl delete -f "$BATS_TEST_DIRNAME"/../examples/resourceclaim_disable_ebpf.yaml
kubectl delete -f "$BATS_TEST_DIRNAME"/../examples/deviceclass.yaml
}

If a test which runs after the above BPF test restarts the dranet pod in node "-worker2", the dranet pod's container will fail to be created with the error:

Error: failed to generate container "dd27c26e7ba91d111760a9070b946fb1aa768181fb76be0d41430d30326bc3f3" spec: failed to apply OCI options: path "/sys/fs/bpf" is mounted on "/sys/fs/bpf" but it is not a shared or slave mount

It's still uncertain whether this is a container runtime failure, or if we are doing something obscure in these tests which we should not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions