ip,ip6: add rate limiting for ICMP errors, ARP and ICMP input#576
ip,ip6: add rate limiting for ICMP errors, ARP and ICMP input#576
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-path rate limiting and selective graph config updates. Introduces a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/api/gr_infra.h`:
- Around line 391-393: The three rate fields (icmp_error_rate, arp_rate,
icmp_rate) in struct gr_graph_conf cannot use 0 both as “disable” and
“unchanged” because GR_GRAPH_CONF_SET treats zeroed members as “unchanged”;
modify the API to add explicit presence flags (e.g., a bitmask or bools like
has_icmp_error_rate/has_arp_rate/has_icmp_rate) to the struct and update the
setter/serialization logic (the code path used by GR_GRAPH_CONF_SET and callers
such as modules/infra/cli/graph.c) to consult these presence bits when encoding
requests so a caller can send has_*=true with value 0 to mean “disable” while
has_*=false means “leave unchanged.”
In `@modules/infra/datapath/gr_datapath.h`:
- Around line 18-31: The rate limiter currently allows a whole vector when
tokens<nb_pkts and doesn't apply a runtime decrease of max_rate; in rate_limited
(and struct rate_limit_ctx usage) change the logic so: if max_rate == 0 return
true (reject all); on refill (now - last_refill >= rte_get_tsc_hz()) set
ctx->tokens = max_rate and update last_refill; otherwise clamp ctx->tokens =
RTE_MIN(ctx->tokens, max_rate) to handle runtime decreases; then if ctx->tokens
< nb_pkts return true (reject the batch) else subtract nb_pkts from ctx->tokens
(ctx->tokens -= nb_pkts); this ensures batches larger than remaining tokens are
rejected and lowers-of-rate take effect immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6713d6cb-850c-47c1-a541-4e2c5fcf04e0
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/gr_graph.hmodules/infra/control/graph.cmodules/infra/datapath/gr_datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/control/graph.h`:
- Line 40: The global mutable declaration extern struct gr_graph_conf graph_conf
exposes graph_conf (type gr_graph_conf) to concurrent reads of graph_conf.*_rate
from datapath workers while modules/infra/control/graph.c mutates it, causing
data races; fix by removing the shared mutable global and either (a) change the
*_rate fields to atomic types and update them via atomic_store/atomic_load, or
(b) stop in-place updates and publish an immutable snapshot (allocate a new
gr_graph_conf, populate it in graph.c, then publish via an atomic pointer swap
and provide an accessor like gr_get_graph_conf_snapshot()) so datapath readers
use the snapshot or atomic loads instead of reading a mutable graph_conf
directly.
In `@modules/infra/datapath/datapath.h`:
- Around line 22-31: The limiter currently ignores upward changes to max_rate
because the code returns early when max_rate==0 and only refills tokens on a
periodic timer; update the logic so that when max_rate increases (including 0 ->
N) you immediately reseed the token bucket by setting ctx->tokens = max_rate and
ctx->last_refill = now before packet-drop decisions. Concretely, detect an
upward change by comparing the new max_rate to the previously configured value
stored in the ctx (e.g., ctx->max_rate or a saved prev_rate), perform the
immediate reseed, then update ctx->max_rate to the new value; keep the periodic
refill logic for normal operation and preserve the existing behavior when
max_rate is decreased or set to 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cb6146b-de9e-432e-944b-a9df65b62a27
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/graph.cmodules/infra/control/graph.hmodules/infra/datapath/datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/ip/datapath/icmp_input.c
- modules/infra/control/graph.c
- modules/infra/api/gr_infra.h
|
|
||
| rte_edge_t gr_node_attach_parent(const char *parent, const char *node); | ||
|
|
||
| extern struct gr_graph_conf graph_conf; |
There was a problem hiding this comment.
graph_conf is now a live cross-thread data race.
modules/infra/control/graph.c updates this struct in place, while the new datapath fast paths read graph_conf.*_rate on every vector. Exporting it as a plain global here makes those control-plane writes race with worker-lcore reads. Please publish the rate fields through atomics or swap a read-only snapshot instead of sharing a mutable struct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/graph.h` at line 40, The global mutable declaration
extern struct gr_graph_conf graph_conf exposes graph_conf (type gr_graph_conf)
to concurrent reads of graph_conf.*_rate from datapath workers while
modules/infra/control/graph.c mutates it, causing data races; fix by removing
the shared mutable global and either (a) change the *_rate fields to atomic
types and update them via atomic_store/atomic_load, or (b) stop in-place updates
and publish an immutable snapshot (allocate a new gr_graph_conf, populate it in
graph.c, then publish via an atomic pointer swap and provide an accessor like
gr_get_graph_conf_snapshot()) so datapath readers use the snapshot or atomic
loads instead of reading a mutable graph_conf directly.
| if (max_rate == 0) | ||
| return false; | ||
|
|
||
| if (now - ctx->last_refill >= rte_get_tsc_hz()) { | ||
| ctx->tokens = max_rate; | ||
| ctx->last_refill = now; | ||
| } else if (ctx->tokens == 0) { | ||
| return true; | ||
| } | ||
| ctx->tokens -= RTE_MIN(ctx->tokens, nb_pkts); |
There was a problem hiding this comment.
Upward rate changes are not applied immediately.
Line 22 leaves the bucket state untouched when the limiter is disabled, and Line 25 is the only place that replenishes tokens. If a node has already drained its bucket and the operator later increases max_rate — especially 0 -> N — Line 28 will keep dropping whole vectors until the stale 1-second refill window expires. That contradicts the stated “take effect immediately” behavior and can blackhole traffic right after reconfiguration. Please reseed the bucket on upward changes instead of waiting for the next periodic refill.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/datapath/datapath.h` around lines 22 - 31, The limiter
currently ignores upward changes to max_rate because the code returns early when
max_rate==0 and only refills tokens on a periodic timer; update the logic so
that when max_rate increases (including 0 -> N) you immediately reseed the token
bucket by setting ctx->tokens = max_rate and ctx->last_refill = now before
packet-drop decisions. Concretely, detect an upward change by comparing the new
max_rate to the previously configured value stored in the ctx (e.g.,
ctx->max_rate or a saved prev_rate), perform the immediate reseed, then update
ctx->max_rate to the new value; keep the periodic refill logic for normal
operation and preserve the existing behavior when max_rate is decreased or set
to 0.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/api/gr_infra.h`:
- Around line 388-394: The enum gr_graph_conf_set_attr_t is declared as uint16_t
but only GR_GRAPH_SET_RX_BURST uses GR_BIT16 while the other flags use GR_BIT64;
change GR_BIT64 to GR_BIT16 for GR_GRAPH_SET_VECTOR, GR_GRAPH_SET_ICMP_ERROR,
GR_GRAPH_SET_ARP and GR_GRAPH_SET_ICMP so all bit macros match the uint16_t base
type and preserve unique bit positions within the 16-bit range.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9350d67f-f12f-46e1-9fee-922dd53ea5e8
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/graph.cmodules/infra/control/graph.hmodules/infra/datapath/datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/infra/control/graph.h
- modules/infra/datapath/datapath.h
- modules/infra/control/graph.c
ac4daa0 to
a70f4ff
Compare
The graph config API uses zero to mean "don't change this field" for rx_burst_max and vector_max. Future fields may need zero as a valid value, so the convention does not scale. Add a set_attrs bitmask to the config set request so that the handler knows which fields the client explicitly provided. Move validation (zero and overflow checks) inside the per-field blocks so they only apply to fields being changed. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Under a flood of bad packets (TTL=1, no-route, etc.), every packet triggers expensive ICMP error construction: address lookups, header prepend with potential memmove, and checksum computation. Similarly, ARP and ICMP/ICMPv6 input processing can saturate the control plane thread when flooded. None of this is rate limited today. Add a per-node token bucket to all ICMP error nodes, ARP input, and ICMP/ICMPv6 input nodes. The bucket refills once per second and the check runs before any per-packet work. When the bucket is empty, the whole vector is moved to a dedicated drop node via rte_node_next_stream_move(), avoiding any unnecessary processing. Three separate rates are configurable via the graph config API: icmp-error-rate, arp-rate and icmp-rate. All default to 1000 packets/sec per node per worker. Setting a rate to 0 disables the limit. Rate changes take effect immediately without graph reload. Signed-off-by: Robin Jarry <rjarry@redhat.com>
|
|
||
| // Must be bumped when making non-backward compatible changes in API headers | ||
| #define GR_API_VERSION 2 | ||
| #define GR_API_VERSION 3 |
There was a problem hiding this comment.
As gr_graph_conf_set_attr_t and gr_graph_conf are modified, do we need to bump this GR_API_VERSION manually ?
There was a problem hiding this comment.
Yes, we do. Otherwise the API check complains about non-compatible changes made without explicitly bumping the API version.
There was a problem hiding this comment.
Checking for API changes between v0.15.0-35-g84e372a9 and v0.15.0-36-gc8cc65e7
1 function with some indirect sub-type change:
[C] 'function gr_graph_conf* GR_GRAPH_CONF_GET_(gr_empty*)' at gr_infra.h:420:1 has some indirect sub-type changes:
return type changed:
in pointed to type 'struct gr_graph_conf' at gr_infra.h:412:1:
type size changed from 4 to 10 (in bytes)
3 data member insertions:
'uint16_t icmp_error_rate', at offset 4 (in bytes) at gr_infra.h:415:1
'uint16_t arp_rate', at offset 6 (in bytes) at gr_infra.h:416:1
'uint16_t icmp_rate', at offset 8 (in bytes) at gr_infra.h:417:1
1 changed type unreachable from any public interface:
[C] 'struct gr_graph_conf_set_req' changed:
type size changed from 6 to 12 (in bytes)
2 data member changes:
anonymous data member at offset 0 (in bytes) changed from:
struct gr_graph_conf{uint16_t gr_graph_conf::rx_burst_max; uint16_t gr_graph_conf::vector_max;}
to:
struct gr_graph_conf{uint16_t gr_graph_conf::rx_burst_max; uint16_t gr_graph_conf::vector_max; uint16_t gr_graph_conf::icmp_error_rate; uint16_t gr_graph_conf::arp_rate; uint16_t gr_graph_conf::icmp_rate;}
and size changed from 4 to 10 (in bytes) (by +6 bytes)
'gr_graph_conf_set_attr_t set_attrs' offset changed from 4 to 10 (in bytes) (by +6 bytes)
breaking API changes
GR_API_VERSION changed from 2 to 3
| with_help( | ||
| "ICMP errors/sec per node per worker (0 = no limit).", | ||
| ec_node_uint("ICMP_ERROR_RATE", 0, UINT16_MAX, 10) | ||
| ), | ||
| with_help( | ||
| "ARP packets/sec per worker (0 = no limit).", | ||
| ec_node_uint("ARP_RATE", 0, UINT16_MAX, 10) | ||
| ), | ||
| with_help( | ||
| "ICMP/ICMPv6 input packets/sec per worker (0 = no limit).", | ||
| ec_node_uint("ICMP_RATE", 0, UINT16_MAX, 10) |
There was a problem hiding this comment.
What about setting the lower bound to RTE_GRAPH_BURST_SIZE, to have an explicit value for the user ?
With the current code: ctx->tokens -= RTE_MIN(ctx->tokens, nb_pkts);, if the user sets the number of tokens to 1, we may still send up to RTE_GRAPH_BURST_SIZE errors, or ICMP packets, etc.
There was a problem hiding this comment.
But then how would you disable it?
Indeed, the limit is soft and we allow the full burst to pass even if there is less than nb_pkts left in the bucket. But since it is transient, I thought it was fine.
| if (now - ctx->last_refill >= rte_get_tsc_hz()) { | ||
| ctx->tokens = max_rate; | ||
| ctx->last_refill = now; | ||
| } else if (ctx->tokens == 0) { |
There was a problem hiding this comment.
Just a minor detail, shouldn’t the refill be proportional?
There was a problem hiding this comment.
Proportional to what? I am not familiar with token bucket algorithms I have implemented the most basic solution.
There was a problem hiding this comment.
It is not far from what you have added, simply:
- define the refill rate of your bucket --> reflects to number of msgs per second
- define the capacity of your bucket --> in idle state this will allow some instantaneous burst behavior
your refill should be similar to:
add_tokens = (now - ctx->last_refill) * max_rate / rte_get_tsc_hz(); // proportional to elapsed ticks
(some granularity guard can be added here to skip short elapsed time)
if (!add_tokens) return;
ctx->tokens += add_tokens;
ctx->tokens = RTE_MIN(ctx->tokens, ctx->bucket_capacity);
ctx->last_refill = now;
I just found the kernel implementation a great example to inspire of:
- https://docs.kernel.org/networking/ip-sysctl.html
- https://github.com/torvalds/linux/blob/dca922e019dd758b4c1b4bec8f1d509efddeaab4/net/ipv4/icmp.c#L229
net->ipv4.icmp_global_credit --> your tokens
net->ipv4.sysctl_icmp_msgs_per_sec --> refill rate
net->ipv4.icmp_global_stamp --> last refill
net->ipv4.sysctl_icmp_msgs_burst --> bucket capacity
Under a flood of bad packets (TTL=1, no-route, etc.), every packet triggers expensive ICMP error construction: address lookups, header prepend with potential memmove, and checksum computation. Similarly, ARP and ICMP/ICMPv6 input processing can saturate the control plane thread when flooded. None of this is rate limited today.
Add a per-node token bucket to all ICMP error nodes, ARP input, and ICMP/ICMPv6 input nodes. The bucket refills once per second and the check runs before any per-packet work. When the bucket is empty, the whole vector is moved to a dedicated drop node via rte_node_next_stream_move(), avoiding any unnecessary processing.
Three separate rates are configurable via the graph config API: icmp-error-rate, arp-rate and icmp-rate. All default to 1000 packets/sec per node per worker. Setting a rate to 0 disables the limit. Rate changes take effect immediately without graph reload.
Closes: #14
Rate limiting for ICMP errors, ARP, and ICMP/ICMPv6 input
Behavior
Implementation
Control-plane and configuration
CLI
Precise semantics from code