Vf macs#579
Conversation
|
(just sent a fix as the unit test was missing an update for ASan / leak detection) |
|
(fixed another stupid bug, introduced by rebase) |
| m->mac = *mac; | ||
| new_m = (struct port_mac) {.refcnt = 1, .mac = *mac}; | ||
| gr_vec_add(port->filter.macs, new_m); | ||
| m = &port->filter.macs[gr_vec_len(port->filter.macs) - 1]; |
There was a problem hiding this comment.
Just for readability:
m = &gr_vec_last(port->filter.macs);There was a problem hiding this comment.
../modules/infra/control/port.c: In function ‘port_mac_add’:
../modules/infra/control/port.c:694:13: error: lvalue required as unary ‘&’ operand
694 | m = &gr_vec_last(port->filter.macs);
| ^
There was a problem hiding this comment.
I can add a gr_vec_last_ref.
| for (unsigned i = 0; i < port->filter.count - 1; i++) { | ||
| for (unsigned i = 0; i < gr_vec_len(port->filter.macs) - 1; i++) { | ||
| m = &port->filter.macs[i]; |
There was a problem hiding this comment.
The last element must be skipped.
So something like:
- for (unsigned i = 0; i < gr_vec_len(port->filter.macs) - 1; i++) {
- m = &port->filter.macs[i];
+ gr_vec_foreach_ref(m, port->filter.macs) {
+ if (m == gr_vec_last_ref(port->filter.macs))
+ break;
There was a problem hiding this comment.
Ah I had missed that detail. Yes, why not with gr_vec_last_ref
There was a problem hiding this comment.
Ok, it was the main reason why I went with writting those loops that way.
I am not sure gr_vec_last_ref() keeps the philosophy of the original gr_vec API.
For example deleting an element requires an index.
But other than that, it is more readable.
|
|
||
| // Reinstall all addresses after disabling promisc. | ||
| for (i = 0; i < port->filter.count; i++) { | ||
| for (i = 0; i < gr_vec_len(port->filter.macs); i++) { |
| struct { | ||
| mac_filter_flags_t flags; | ||
| unsigned hw_limit; | ||
| gr_vec struct port_mac *macs; | ||
| } filter; |
There was a problem hiding this comment.
now that there is a single gr_vec in this inline struct, we could get rid of it.
| LOG(DEBUG, | ||
| "failed to %s mac " ETH_F " to %s: %s", | ||
| add ? "add" : "del", | ||
| mac, | ||
| iface->name, | ||
| strerror(errno)); |
There was a problem hiding this comment.
do not log anything if ret == 0
There was a problem hiding this comment.
Ah yes, the if (ret < 0) got dropped.
|
Changelog since last push:
|
|
(rebased after gr_vec rename) |
|
(rebased after API update for event registration) |
|
(rebased, and refactored the series quite extensively, now mac tracking is done all along a iface chain, and cli commands have been added for both manual setup and debugging) |
After refactoring multicast filtering, there is no need for wrapping some symbols. Signed-off-by: David Marchand <david.marchand@redhat.com>
refcount integer overflowing would be a nightmare to debug. Add explicit checks. Signed-off-by: David Marchand <david.marchand@redhat.com>
The current code around secondary mac addresses is buggy if a hw limit is reached once, and mac addresses are removed under this limit: on the first removal, hw_limit is reset to 0 but never updated when re-adding the mac addresses. On the other hand, trying to evaluate the hw limit at runtime is tricky, as the hardware may reply differently depending on global resources availability. For example, when using VFs on Intel nics, all VFs shared a maximum number of 32k switching rules. For those reasons, prefer tracking if mac addresses got into the hardware or not. On deletion, try to add more mac, and disable promisc when all of them are in hardware. Signed-off-by: David Marchand <david.marchand@redhat.com>
Move the tests around port mac addresses one layer higher by calling iface_XX API. For now, only port type is tested, we can extend those tests with other iface types later. Signed-off-by: David Marchand <david.marchand@redhat.com>
This avoids revalidating it in the interface specific ops. The check for adding multicast mac addresses in the vlan iface code was left behind from when only multicast addresses mattered. No code directly calls ->add_eth_addr op, so validate the mac input parameter at the iface API level. Besides, checking if the passed mac address is the same as the primary mac can be moved in a single location. Signed-off-by: David Marchand <david.marchand@redhat.com>
Only tracking mac addresses at the port level (even with a refcount) is broken when considering stacked interfaces, and ifaces reconfigurations. Make the concept generic by introducing a vector of mac addresses at the iface level. By moving the logic to iface level, .add_eth_addr/.del_eth_addr are now only called on the first addition/last removal of a mac address. We can then remove bond and port specific mac tracking. And finally as a bonus, this removes the limit to RTE_ETH_NUM_RECEIVE_MAC_ADDR macs in port ifaces. Note: RTE_ETH_NUM_RECEIVE_MAC_ADDR is a confusing macro that was introduced for VMDq support in DPDK, while many nics support more mac addresses. Signed-off-by: David Marchand <david.marchand@redhat.com>
f81b3f6 to
6ac7154
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds refcounted per-interface MAC management: new Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/infra/control/iface.c (1)
721-727:⚠️ Potential issue | 🔴 CriticalUnwind tracked MAC filters before freeing
iface->macs.Line 727 frees the refcount table without deleting the tracked filters. For interfaces whose MAC hooks propagate state, such as VLANs propagating to the parent port, destroying the interface can leak parent refcounts and leave stale hardware filters installed.
🐛 Suggested cleanup shape
type = iface_type_get(iface->type); assert(type != NULL); + while (vec_len(iface->macs) > 0) { + struct rte_ether_addr mac = iface->macs[0].mac; + uint8_t refcnt = iface->macs[0].refcnt; + + for (uint8_t i = 0; i < refcnt; i++) { + if (iface_del_eth_addr(iface, &mac) < 0) { + LOG(WARNING, "%s: cleanup mac " ETH_F ": %s", + iface->name, &mac, strerror(errno)); + break; + } + } + } ret = type->fini(iface); free(iface->name); free(iface->description); vec_free(iface->subinterfaces); vec_free(iface->macs);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/iface.c` around lines 721 - 727, Before freeing iface->macs, unwind any tracked MAC filters to remove propagated refcounts and hardware filters: locate the teardown sequence around iface_type_get(...)->fini(iface) and the frees of iface->name, iface->description and iface->macs, and insert a loop that walks iface->macs and calls the module's MAC-untrack/unref function for each entry (the same API used when removing MAC filters elsewhere in the codebase) to decrement parent refcounts and remove hardware state; only after that call vec_free(iface->macs) as shown.
🧹 Nitpick comments (1)
modules/infra/api/iface.c (1)
142-185: Consider bypassing the iface walk wheniface_idis set.When
req->iface_id != GR_IFACE_ID_UNDEF, the handler still walks every interface and filters inside the loop. A short-circuit viaiface_from_id(req->iface_id)would be O(1) and also returnENODEVfor a bad id instead of silently streaming nothing. Correctness-wise the current behavior is acceptable, flagging only because it diverges fromiface_get/iface_delsemantics for unknown ids.♻️ Sketch
- while ((iface = iface_next(GR_IFACE_TYPE_UNDEF, iface)) != NULL) { - if (req->iface_id != GR_IFACE_ID_UNDEF && iface->id != req->iface_id) - continue; + if (req->iface_id != GR_IFACE_ID_UNDEF) { + iface = iface_from_id(req->iface_id); + if (iface == NULL) + return api_out(ENODEV, 0, NULL); + } + do { + if (req->iface_id == GR_IFACE_ID_UNDEF + && (iface = iface_next(GR_IFACE_TYPE_UNDEF, iface)) == NULL) + break; ... - } + } while (req->iface_id == GR_IFACE_ID_UNDEF);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/api/iface.c` around lines 142 - 185, The handler iface_mac_list currently walks all interfaces then filters by req->iface_id; change it to short-circuit when req->iface_id != GR_IFACE_ID_UNDEF by calling iface_from_id(req->iface_id) and processing only that iface (or set ret = ENODEV and goto out if iface_from_id returns NULL) instead of using iface_next over all interfaces; keep the existing logic for sending primary MAC via iface_get_eth_addr and for iterating vec_foreach_ref over iface->macs and calling api_send, and in the no-id case preserve the iface_next loop and final api_out(ret, 0, NULL) behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/control/iface.c`:
- Around line 601-604: The code currently removes the MAC entry from iface->macs
unconditionally after calling type->del_eth_addr(iface, m); change the logic to
only call vec_del(iface->macs, m - iface->macs) when type->del_eth_addr
succeeded (ret == 0); if del_eth_addr fails, preserve the refcounted entry in
iface->macs and return the error (propagate ret) so control-plane state remains
consistent. Use the existing variables type->del_eth_addr, iface->macs, m and
ret to implement this conditional removal.
In `@modules/infra/control/port.c`:
- Around line 687-695: port_mac_del currently swallows errors from
rte_eth_dev_mac_addr_remove which allows the caller to remove MAC tracking even
if the NIC removal failed; modify port_mac_del so that when
rte_eth_dev_mac_addr_remove(port->port_id, &m->mac) returns a negative error
(except -ENOTSUP) it logs as before but then returns that error code to the
caller instead of falling through to return 0. Update the function port_mac_del
(and keep existing check for m->hardware and -ENOTSUP) to propagate the failure
(return ret) so callers that remove the MAC from tracking (caller in iface.c)
can abort on hardware removal failure.
In `@modules/l2/control/fdb.c`:
- Around line 352-373: The fdb_event_cb currently treats GR_EVENT_FDB_DEL as an
add because it calls push_mac_to_hw(member, &fdb->mac, event !=
GR_EVENT_FDB_UPDATE); change the logic so deletes are handled as hardware
removals by passing a remove flag that checks for GR_EVENT_FDB_DEL explicitly
(e.g., replace the third argument with a boolean expression using
GR_EVENT_FDB_DEL or otherwise map
GR_EVENT_FDB_ADD/GR_EVENT_FDB_UPDATE/GR_EVENT_FDB_DEL correctly), updating
fdb_event_cb and the call to push_mac_to_hw to ensure delete events cause
removal rather than addition.
---
Outside diff comments:
In `@modules/infra/control/iface.c`:
- Around line 721-727: Before freeing iface->macs, unwind any tracked MAC
filters to remove propagated refcounts and hardware filters: locate the teardown
sequence around iface_type_get(...)->fini(iface) and the frees of iface->name,
iface->description and iface->macs, and insert a loop that walks iface->macs and
calls the module's MAC-untrack/unref function for each entry (the same API used
when removing MAC filters elsewhere in the codebase) to decrement parent
refcounts and remove hardware state; only after that call vec_free(iface->macs)
as shown.
---
Nitpick comments:
In `@modules/infra/api/iface.c`:
- Around line 142-185: The handler iface_mac_list currently walks all interfaces
then filters by req->iface_id; change it to short-circuit when req->iface_id !=
GR_IFACE_ID_UNDEF by calling iface_from_id(req->iface_id) and processing only
that iface (or set ret = ENODEV and goto out if iface_from_id returns NULL)
instead of using iface_next over all interfaces; keep the existing logic for
sending primary MAC via iface_get_eth_addr and for iterating vec_foreach_ref
over iface->macs and calling api_send, and in the no-id case preserve the
iface_next loop and final api_out(ret, 0, NULL) behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 546fbf28-cbc4-4483-9e75-881082bb9333
📒 Files selected for processing (18)
modules/infra/api/gr_infra.hmodules/infra/api/iface.cmodules/infra/cli/mac.cmodules/infra/cli/meson.buildmodules/infra/control/bond.cmodules/infra/control/bond.hmodules/infra/control/iface.cmodules/infra/control/iface.hmodules/infra/control/iface_test.cmodules/infra/control/meson.buildmodules/infra/control/port.cmodules/infra/control/port.hmodules/infra/control/port_test.cmodules/infra/control/vlan.cmodules/l2/control/bridge.cmodules/l2/control/fdb.cmodules/l2/control/l2.hsmoke/iface_mac_test.sh
💤 Files with no reviewable changes (3)
- modules/infra/control/bond.h
- modules/infra/control/port.h
- modules/infra/control/port_test.c
| } else { | ||
| ret = type->del_eth_addr(iface, m); | ||
| vec_del(iface->macs, m - iface->macs); | ||
| } |
There was a problem hiding this comment.
Keep the MAC entry when the delete hook fails.
Line 603 removes the refcounted entry even if type->del_eth_addr() on Line 602 fails. That loses control-plane state while the hardware or parent interface may still have the filter installed.
🐛 Proposed fix
} else {
ret = type->del_eth_addr(iface, m);
- vec_del(iface->macs, m - iface->macs);
+ if (ret == 0)
+ vec_del(iface->macs, m - iface->macs);
}
return ret;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/iface.c` around lines 601 - 604, The code currently
removes the MAC entry from iface->macs unconditionally after calling
type->del_eth_addr(iface, m); change the logic to only call vec_del(iface->macs,
m - iface->macs) when type->del_eth_addr succeeded (ret == 0); if del_eth_addr
fails, preserve the refcounted entry in iface->macs and return the error
(propagate ret) so control-plane state remains consistent. Use the existing
variables type->del_eth_addr, iface->macs, m and ret to implement this
conditional removal.
There was a problem hiding this comment.
Not much we can do at this point if the parent iface refused the removal.
There was a problem hiding this comment.
@david-marchand, understood — if the underlying layer rejects the removal, the MAC state is already in an unrecoverable situation and there's no safe way to reconcile it. Fair enough, I'll accept this as intentional.
🧠 Learnings used
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.
Learnt from: vjardin
Repo: DPDK/grout PR: 565
File: modules/ipip/datapath_out.c:82-86
Timestamp: 2026-03-23T23:58:22.163Z
Learning: In the grout codebase, `ip_output` and `iface_output` maintain separate iface-type dispatch tables. `ip_output_register_interface_type` registers next nodes for L3 dispatch (e.g., ipip_output for GR_IFACE_TYPE_IPIP), while `iface_output_type_register` registers L2/physical next nodes (e.g., port_output for GR_IFACE_TYPE_PORT). IPIP egress packets flow ip_output → ipip_output directly, bypassing iface_output for the IPIP logical interface. iface_output only sees the outer/physical iface for the encapsulated packet. Therefore capture_enqueue calls in ipip_output (for the IPIP logical iface) and in iface_output (for the physical iface) are NOT duplicates — they capture at different layers. Do not flag these as duplicate captures.
Learnt from: rjarry
Repo: DPDK/grout PR: 553
File: modules/infra/control/loopback.c:154-165
Timestamp: 2026-03-16T08:03:05.858Z
Learning: In the grout codebase, the `tun_pi` header prepended by the TUN device is NOT stripped in `modules/infra/control/loopback.c`. It is stripped in the datapath node `loopback_input_process` in `modules/infra/datapath/loop_input.c` via `rte_pktmbuf_adj(mbuf, sizeof(struct tun_pi))`, before the packet is dispatched to the L3 stack. Do not flag missing tun_pi stripping in loopback.c.
Learnt from: rjarry
Repo: DPDK/grout PR: 504
File: modules/l2/control/bridge.c:60-77
Timestamp: 2026-02-14T09:31:42.766Z
Learning: In the DPDK/grout codebase, `bridge_detach_member` (modules/l2/control/bridge.c) is only called during interface reconfiguration when an interface changes mode or domain, and it is always followed by another attach or VRF assignment. Therefore, it does not need to restore `vrf_id` via `vrf_default_get_or_create()`. Only `bridge_fini` (during bridge deletion) needs to explicitly restore the default VRF, as there is no subsequent attach operation.
Learnt from: rjarry
Repo: DPDK/grout PR: 466
File: modules/ip6/control/icmp6.c:173-178
Timestamp: 2025-12-18T16:04:37.486Z
Learning: In modules/ip/control/icmp.c and modules/ip6/control/icmp6.c, the mbufs stored in icmp_queue are owned by other pktmbuf pools that are freed elsewhere. It is not necessary to free these mbufs in icmp_fini/icmp6_fini before calling rte_mempool_free on the icmp_queue_item mempool.
Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:154-155
Timestamp: 2026-03-16T15:27:46.743Z
Learning: In the grout codebase, assume all mbufs are mono-segment. Therefore rte_pktmbuf_data_len(m) is equivalent to rte_pktmbuf_pkt_len(m); multi-segment mbuf concerns do not apply. Do not flag multi-segment handling issues in grout code. This guidance applies across grout C sources under modules/, and review checks should not raise multi-segment related issues for such files unless there is explicit evidence of multi-segment usage in a specific case.
| static int port_mac_del(struct iface *iface, struct iface_mac *m) { | ||
| struct iface_info_port *port = iface_info_port(iface); | ||
| struct port_mac *m; | ||
| uint8_t i; | ||
| int ret; | ||
|
|
||
| if (mac == NULL) | ||
| return errno_set(EINVAL); | ||
| if (rte_is_multicast_ether_addr(mac)) | ||
| return 0; // ALLMULTI is always on | ||
|
|
||
| if (rte_is_same_ether_addr(mac, &port->mac)) | ||
| return 0; | ||
|
|
||
| for (i = 0; i < port->filter.count; i++) { | ||
| m = &port->filter.macs[i]; | ||
| if (rte_is_same_ether_addr(&m->mac, mac)) | ||
| goto found; | ||
| } | ||
| return errno_set(ENOENT); | ||
|
|
||
| found: | ||
| if (--m->refcnt > 0) { | ||
| LOG(DEBUG, | ||
| "%s: mac " ETH_F " still filtered (refs=%u)", | ||
| iface->name, | ||
| mac, | ||
| m->refcnt); | ||
| return 0; | ||
| if (m->hardware) { | ||
| ret = rte_eth_dev_mac_addr_remove(port->port_id, &m->mac); | ||
| if (ret < 0 && ret != -ENOTSUP) | ||
| LOG(WARNING, "%s: %s", iface->name, rte_strerror(-ret)); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect the MAC delete hook call path and confirm whether state is removed after hook success.
rg -n -C4 'type->del_eth_addr\(iface, m\)|\.del_eth_addr = port_mac_del|rte_eth_dev_mac_addr_remove'Repository: DPDK/grout
Length of output: 4095
🏁 Script executed:
cat -n modules/infra/control/port.c | sed -n '687,710p'Repository: DPDK/grout
Length of output: 929
🏁 Script executed:
cat -n modules/infra/control/iface.c | sed -n '595,610p'Repository: DPDK/grout
Length of output: 439
🏁 Script executed:
rg -n 'errno_log|^int errno_log|^void errno_log' --type cRepository: DPDK/grout
Length of output: 5661
🏁 Script executed:
rg -n -A5 -B5 'will_return.*rte_eth_dev_mac_addr_remove.*-' modules/infra/control/iface_test.cRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
cat -n modules/infra/control/port.c | sed -n '687,730p'Repository: DPDK/grout
Length of output: 1551
Propagate hardware MAC removal failures instead of silently continuing.
The function always returns 0 (line 726) even when rte_eth_dev_mac_addr_remove() fails. The caller immediately removes the MAC from tracking (iface.c:603) regardless of the hook's return value, leaving the NIC potentially still accepting traffic on that MAC while the app believes it's deleted.
🐛 Proposed fix
if (m->hardware) {
ret = rte_eth_dev_mac_addr_remove(port->port_id, &m->mac);
- if (ret < 0 && ret != -ENOTSUP)
- LOG(WARNING, "%s: %s", iface->name, rte_strerror(-ret));
+ if (ret < 0)
+ return errno_log(-ret, "rte_eth_dev_mac_addr_remove");
+ m->hardware = false;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/port.c` around lines 687 - 695, port_mac_del currently
swallows errors from rte_eth_dev_mac_addr_remove which allows the caller to
remove MAC tracking even if the NIC removal failed; modify port_mac_del so that
when rte_eth_dev_mac_addr_remove(port->port_id, &m->mac) returns a negative
error (except -ENOTSUP) it logs as before but then returns that error code to
the caller instead of falling through to return 0. Update the function
port_mac_del (and keep existing check for m->hardware and -ENOTSUP) to propagate
the failure (return ret) so callers that remove the MAC from tracking (caller in
iface.c) can abort on hardware removal failure.
There was a problem hiding this comment.
Same comment as before. I don't see what to do if removal failed.
There was a problem hiding this comment.
@david-marchand, that's a fair point — there's no sensible recovery path when rte_eth_dev_mac_addr_remove fails (the NIC is in an unknown state regardless), so forcing the caller to abort the tracking removal would only make things worse by leaving software state inconsistent with whatever hardware state actually is. Logging a warning and continuing is the right call here.
I'll drop this comment.
✏️ Learnings added
Learnt from: david-marchand
Repo: DPDK/grout PR: 579
File: modules/infra/control/port.c:687-695
Timestamp: 2026-04-23T08:46:26.517Z
Learning: In `modules/infra/control/port.c`, `port_mac_del` intentionally ignores errors from `rte_eth_dev_mac_addr_remove` (only logging a WARNING) because there is no meaningful recovery when hardware MAC filter removal fails — the MAC must be removed from software tracking regardless to avoid a worse inconsistency. Do not flag this as a missing error propagation issue.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: rjarry
Repo: DPDK/grout PR: 553
File: modules/infra/control/loopback.c:154-165
Timestamp: 2026-03-16T08:03:05.858Z
Learning: In the grout codebase, the `tun_pi` header prepended by the TUN device is NOT stripped in `modules/infra/control/loopback.c`. It is stripped in the datapath node `loopback_input_process` in `modules/infra/datapath/loop_input.c` via `rte_pktmbuf_adj(mbuf, sizeof(struct tun_pi))`, before the packet is dispatched to the L3 stack. Do not flag missing tun_pi stripping in loopback.c.
Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:156-160
Timestamp: 2026-03-17T08:11:06.305Z
Learning: In `modules/infra/datapath/port_rx.c`, the bounds check `rte_pktmbuf_data_len(m) >= csum_offset + 2` in `fix_l4_csum()` can never fail for virtio because the DPDK virtio driver guarantees all headers (including the L4 checksum field) are stored in the first segment/descriptor. Do not flag this guard as an issue in virtio RX paths.
Learnt from: rjarry
Repo: DPDK/grout PR: 504
File: modules/l2/control/bridge.c:60-77
Timestamp: 2026-02-14T09:31:42.766Z
Learning: In the DPDK/grout codebase, `bridge_detach_member` (modules/l2/control/bridge.c) is only called during interface reconfiguration when an interface changes mode or domain, and it is always followed by another attach or VRF assignment. Therefore, it does not need to restore `vrf_id` via `vrf_default_get_or_create()`. Only `bridge_fini` (during bridge deletion) needs to explicitly restore the default VRF, as there is no subsequent attach operation.
Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:154-155
Timestamp: 2026-03-16T15:27:46.743Z
Learning: In the grout codebase, assume all mbufs are mono-segment. Therefore rte_pktmbuf_data_len(m) is equivalent to rte_pktmbuf_pkt_len(m); multi-segment mbuf concerns do not apply. Do not flag multi-segment handling issues in grout code. This guidance applies across grout C sources under modules/, and review checks should not raise multi-segment related issues for such files unless there is explicit evidence of multi-segment usage in a specific case.
✅ Actions performedReview triggered.
|
rjarry
left a comment
There was a problem hiding this comment.
This looks good to me. I just wonder if we should make the fdb_sync_hardware() call effective only on ports that do not support real promisc.
Is this still a draft?
| // Send primary MAC address first | ||
| struct rte_ether_addr primary_mac; | ||
| if (iface_get_eth_addr(iface, &primary_mac) == 0) { | ||
| struct gr_iface_mac *mac = malloc(sizeof(*mac)); |
There was a problem hiding this comment.
No need to malloc here. You can use a stack variable.
|
|
||
| // Send secondary MAC addresses | ||
| vec_foreach_ref (struct iface_mac *m, iface->macs) { | ||
| struct gr_iface_mac *mac = malloc(sizeof(*mac)); |
There was a problem hiding this comment.
Use the same stack variable here.
Introduce a new top-level "mac" CLI context for managing both primary and secondary MAC addresses on interfaces. The primary MAC address can be changed with "mac set MAC iface IFACE". Secondary MAC addresses can be added and removed with "mac add" and "mac del" commands. The "mac show" command displays all MAC addresses (both primary and secondary) with their reference counts. The primary MAC is always shown first with a "-" in the REFCNT column to distinguish it from secondary addresses. This provides a unified interface for MAC address operations instead of having them scattered under different command contexts. This also helps a lot when troubleshooting Rx filters and other fun jokes about mac addresses. Signed-off-by: David Marchand <david.marchand@redhat.com>
When adding a vlan interface in a bridge, pass on the promiscuous configuration to the parent port. Signed-off-by: David Marchand <david.marchand@redhat.com>
In setups where grout physical port is a VF, traffic sent by other VFs reach grout only if the destination address is multicast or a mac address for which a rx filter was declared in hw. Make use of control path pushed FDB entries to populate hardware and let the packets flow. Signed-off-by: David Marchand <david.marchand@redhat.com>
The Rx filter approach should work in all cases. Adding a global knob could be done, but I don't see how we could discover such "real" promisc capability.
Unfortunately yes, I still want to check with mlx5 in both legacy and switchdev modes. |
|
I double checked the mac workaround with mlx5 manually. Note: it requires dpdk fix c62ff2e6d48b ("common/mlx5: fix MAC deletion on Linux") for deletions to work. As for the full switchdev setup, I am busy on other activities, but it should work as is (I expect a ENOTSUP coming from the mlx5 driver, in such a configuration). |
|
frr full synchronisation is still broken in my env, and I am not sure what is wrong... |
For BGP evpn support with Intel and NVIDIA VFs, pushing remote mac addresses in the hardware becomes needed.
This series is at draft level for now, since I did not re-test the whole thing on real hardware yet.
I also have suspicions on the mac refcount and we may have some bugs in there.
There is probably some need to display all mac filters in the cli as well.
Technical overview
Pushes externally-sourced MAC addresses (e.g., from EVPN/FDB) into hardware RX filters and refactors control-plane MAC tracking into per-interface, refcounted entries so packets destined for VFs are delivered when hardware FDB/filter entries exist.
Core changes
Per-interface refcounted MAC entries
Port hardware interaction
FDB ↔ hardware synchronization
Bond and VLAN propagation
API and CLI
Tests and smoke
Public API and type signature changes
Notes and caveats (from commits/author)