Skip to content

tools: add benchmark for loading sets#535

Open
pzmarzly wants to merge 1 commit into
facebook:mainfrom
pzmarzly:push-voqxqossrrll
Open

tools: add benchmark for loading sets#535
pzmarzly wants to merge 1 commit into
facebook:mainfrom
pzmarzly:push-voqxqossrrll

Conversation

@pzmarzly
Copy link
Copy Markdown
Contributor

Measure time it takes to load a chain that contains a large set. This will be a common operation (especially since sets can update).

Requires #532.

$ rm -rf build && cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=release -DNO_DOCS=1 -DNO_CHECKS=1
$ ninja -C build benchmark_bin bfcli
$ sudo build/output/sbin/benchmark_bin \
    --cli build/output/sbin/bfcli \
    --srcdir . \
    --outfile /tmp/load.json \
    --filter chain_load

...
Run on (4 X 2396.4 MHz CPU s)
...
chain_load__ip4_saddr__x_elem_set/8/manual_time           7.20 ms         7.08 ms          106 load chain, ip4.saddr, 8 elements set
chain_load__ip4_saddr__x_elem_set/128/manual_time         7.69 ms         7.61 ms          107 load chain, ip4.saddr, 128 elements set
chain_load__ip4_saddr__x_elem_set/32768/manual_time       55.3 ms         61.5 ms           12 load chain, ip4.saddr, 32768 elements set

@pzmarzly pzmarzly requested a review from qdeslandes as a code owner May 12, 2026 17:19
@meta-cla meta-cla Bot added the cla signed label May 12, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Claude review of PR #535 (64c0686)

Suggestions

  • SetIterationTime before error checktools/benchmarks/main.cpp:235state.SetIterationTime() is called before verifying bf_chain_set succeeded; reordering would make the intent clearer

Nits

  • Keyword-only params for Stats.failure()tools/benchmarks/bfbencher:65 — inserting commit_sha before from_cache changes positional contract; consider keyword-only (*)
  • Document commits ordering assumptiontools/benchmarks/bfbencher:1079executor.commits[-1] assumes topo-sort oldest-first; a brief comment would help readers
  • Comment on iteration pattern divergencetools/benchmarks/main.cpp:216for (auto _: state) differs from KeepRunningBatch used elsewhere; brief rationale comment would help

Workflow run

@pzmarzly pzmarzly force-pushed the push-voqxqossrrll branch from 66e1e63 to 64c0686 Compare May 12, 2026 17:27
@pzmarzly
Copy link
Copy Markdown
Contributor Author

@claude review

Comment thread tools/benchmarks/main.cpp
int ret = bf_chain_set(chainp.get(), nullptr);
const auto end = std::chrono::steady_clock::now();

if (ret < 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: state.SetIterationTime() is called before checking whether bf_chain_set succeeded. If the load fails, a meaningless duration is recorded before SkipWithError discards it. While Google Benchmark discards results from skipped iterations, reordering makes the intent clearer:

if (ret < 0) {
    state.SkipWithError("failed to load chain");
    break;
}

state.SetIterationTime(
    std::chrono::duration<double>(end - start).count());

Comment thread tools/benchmarks/bfbencher
Comment thread tools/benchmarks/bfbencher
Comment thread tools/benchmarks/main.cpp
const std::string chainName = "bf_benchmark";
uint32_t nelems = state.range(0);

for (auto _: state) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: for (auto _: state) is the first use of this iteration pattern in the file; every other benchmark uses while (state.KeepRunningBatch(...)). A one-line comment explaining why a different pattern is needed here (manual timing of a heavy kernel operation that cannot be batched) would help readers understand the divergence.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual timing of a heavy kernel operation that cannot be batched

They probably can be batched though. The change from while to for was done for me by local Claude Code and I didn't catch it, but now it is a good moment to ask which one is more correct. Quentin what do you think? I'm leaning towards regular KeepRunningBatch, but I don't understand GBenchmark well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (auto _: state) is correct, so let's keep it. This lets Google Benchmark define the number of iterations to run the benchmark for.

We don't use it to benchmark BPF programs, because BPF_PROG_TEST_RUN also accept an iterations argument, so we use it instead and use KeepRunningBatch() to check if we need more iterations. That ensures benchmarks are run with a warm cache, which is more reflective of real-world use cases.

@qdeslandes qdeslandes force-pushed the push-voqxqossrrll branch from 64c0686 to 34cef1d Compare May 12, 2026 20:02
Comment thread tools/benchmarks/main.cpp
Comment on lines +217 to +229
Chain chain(chainName, BF_HOOK_XDP, BF_VERDICT_ACCEPT);
Set s = Set({BF_MATCHER_IP4_SADDR});

for (uint32_t i = 0; i < nelems; ++i)
s << uint32ToIp4(i);

chain << Rule(BF_VERDICT_DROP, std::nullopt, 0,
std::vector<Matcher> {
Matcher(BF_MATCHER_SET, BF_MATCHER_IN, {0, 0, 0, 0}),
});
chain << s;

auto chainp = chain.get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done outside of the benchmarking loop.

Comment thread tools/benchmarks/main.cpp
Comment on lines +243 to +247
ret = bf_chain_flush(chainName.c_str());
if (ret < 0) {
state.SkipWithError("failed to flush chain");
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use

{
    benchmark::ScopedPauseTiming pause(state); // Pauses timing
    ret = bf_chain_flush(chainName.c_str());
    if (ret < 0) {
        state.SkipWithError("failed to flush chain");
        break;
    }
}

So you don't have to manually set the iteration time.

Comment thread tools/benchmarks/main.cpp
->Arg(1 << 7)
->Arg(1 << 15);

void chain_load__ip4_saddr__x_elem_set(::benchmark::State &state)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chain_set__ip4_saddr__x_elem_set

Comment thread tools/benchmarks/main.cpp
const std::string chainName = "bf_benchmark";
uint32_t nelems = state.range(0);

for (auto _: state) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (auto _: state) is correct, so let's keep it. This lets Google Benchmark define the number of iterations to run the benchmark for.

We don't use it to benchmark BPF programs, because BPF_PROG_TEST_RUN also accept an iterations argument, so we use it instead and use KeepRunningBatch() to check if we need more iterations. That ensures benchmarks are run with a warm cache, which is more reflective of real-world use cases.

Comment thread tools/benchmarks/main.cpp
Comment on lines +255 to +257
->Arg(1 << 3)
->Arg(1 << 7)
->Arg(1 << 15)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use different values to have a clear overview of the perf here:

  • 1 << 3 -> 8
  • 1 << 16 -> 65k
  • 1 << 22 -> ~4M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants