tools: add benchmark for loading sets#535
Conversation
Claude review of PR #535 (64c0686)Suggestions
Nits
|
66e1e63 to
64c0686
Compare
|
@claude review |
| int ret = bf_chain_set(chainp.get(), nullptr); | ||
| const auto end = std::chrono::steady_clock::now(); | ||
|
|
||
| if (ret < 0) { |
There was a problem hiding this comment.
Claude: suggestion: state.SetIterationTime() is called before checking whether bf_chain_set succeeded. If the load fails, a meaningless duration is recorded before SkipWithError discards it. While Google Benchmark discards results from skipped iterations, reordering makes the intent clearer:
if (ret < 0) {
state.SkipWithError("failed to load chain");
break;
}
state.SetIterationTime(
std::chrono::duration<double>(end - start).count());| const std::string chainName = "bf_benchmark"; | ||
| uint32_t nelems = state.range(0); | ||
|
|
||
| for (auto _: state) { |
There was a problem hiding this comment.
Claude: nit: for (auto _: state) is the first use of this iteration pattern in the file; every other benchmark uses while (state.KeepRunningBatch(...)). A one-line comment explaining why a different pattern is needed here (manual timing of a heavy kernel operation that cannot be batched) would help readers understand the divergence.
There was a problem hiding this comment.
manual timing of a heavy kernel operation that cannot be batched
They probably can be batched though. The change from while to for was done for me by local Claude Code and I didn't catch it, but now it is a good moment to ask which one is more correct. Quentin what do you think? I'm leaning towards regular KeepRunningBatch, but I don't understand GBenchmark well.
There was a problem hiding this comment.
for (auto _: state) is correct, so let's keep it. This lets Google Benchmark define the number of iterations to run the benchmark for.
We don't use it to benchmark BPF programs, because BPF_PROG_TEST_RUN also accept an iterations argument, so we use it instead and use KeepRunningBatch() to check if we need more iterations. That ensures benchmarks are run with a warm cache, which is more reflective of real-world use cases.
64c0686 to
34cef1d
Compare
| Chain chain(chainName, BF_HOOK_XDP, BF_VERDICT_ACCEPT); | ||
| Set s = Set({BF_MATCHER_IP4_SADDR}); | ||
|
|
||
| for (uint32_t i = 0; i < nelems; ++i) | ||
| s << uint32ToIp4(i); | ||
|
|
||
| chain << Rule(BF_VERDICT_DROP, std::nullopt, 0, | ||
| std::vector<Matcher> { | ||
| Matcher(BF_MATCHER_SET, BF_MATCHER_IN, {0, 0, 0, 0}), | ||
| }); | ||
| chain << s; | ||
|
|
||
| auto chainp = chain.get(); |
There was a problem hiding this comment.
This can be done outside of the benchmarking loop.
| ret = bf_chain_flush(chainName.c_str()); | ||
| if (ret < 0) { | ||
| state.SkipWithError("failed to flush chain"); | ||
| break; | ||
| } |
There was a problem hiding this comment.
Use
{
benchmark::ScopedPauseTiming pause(state); // Pauses timing
ret = bf_chain_flush(chainName.c_str());
if (ret < 0) {
state.SkipWithError("failed to flush chain");
break;
}
}
So you don't have to manually set the iteration time.
| ->Arg(1 << 7) | ||
| ->Arg(1 << 15); | ||
|
|
||
| void chain_load__ip4_saddr__x_elem_set(::benchmark::State &state) |
There was a problem hiding this comment.
chain_set__ip4_saddr__x_elem_set
| const std::string chainName = "bf_benchmark"; | ||
| uint32_t nelems = state.range(0); | ||
|
|
||
| for (auto _: state) { |
There was a problem hiding this comment.
for (auto _: state) is correct, so let's keep it. This lets Google Benchmark define the number of iterations to run the benchmark for.
We don't use it to benchmark BPF programs, because BPF_PROG_TEST_RUN also accept an iterations argument, so we use it instead and use KeepRunningBatch() to check if we need more iterations. That ensures benchmarks are run with a warm cache, which is more reflective of real-world use cases.
| ->Arg(1 << 3) | ||
| ->Arg(1 << 7) | ||
| ->Arg(1 << 15) |
There was a problem hiding this comment.
Let's use different values to have a clear overview of the perf here:
- 1 << 3 -> 8
- 1 << 16 -> 65k
- 1 << 22 -> ~4M
Measure time it takes to load a chain that contains a large set. This will be a common operation (especially since sets can update).
Requires #532.