Skip to content

Vf macs#579

Open
david-marchand wants to merge 9 commits intoDPDK:mainfrom
david-marchand:vf_macs
Open

Vf macs#579
david-marchand wants to merge 9 commits intoDPDK:mainfrom
david-marchand:vf_macs

Conversation

@david-marchand
Copy link
Copy Markdown
Member

@david-marchand david-marchand commented Mar 30, 2026

For BGP evpn support with Intel and NVIDIA VFs, pushing remote mac addresses in the hardware becomes needed.

This series is at draft level for now, since I did not re-test the whole thing on real hardware yet.

I also have suspicions on the mac refcount and we may have some bugs in there.
There is probably some need to display all mac filters in the cli as well.

Technical overview

Pushes externally-sourced MAC addresses (e.g., from EVPN/FDB) into hardware RX filters and refactors control-plane MAC tracking into per-interface, refcounted entries so packets destined for VFs are delivered when hardware FDB/filter entries exist.

Core changes

  • Per-interface refcounted MAC entries

    • Introduced struct iface_mac { uint8_t refcnt; bool hardware; struct rte_ether_addr mac; } and vec struct iface_mac *iface->macs.
    • iface_add_eth_addr(iface, struct iface_mac *):
      • No-ops for multicast or an interface’s default MAC.
      • If MAC exists: increments refcnt with overflow detection and reports error on overflow; does not call type handler.
      • If new: creates entry (refcnt=1) and calls iface_type->add_eth_addr(handler) once.
    • iface_del_eth_addr(iface, struct iface_mac *):
      • If MAC exists: decrements refcnt; if refcnt>1, returns success without calling type handler; if refcnt==1, calls iface_type->del_eth_addr and removes entry.
      • Returns ENOENT when MAC not present; detects/report underflow conditions as errors.
    • iface_destroy frees iface->macs to avoid leaks.
  • Port hardware interaction

    • Port handlers now accept struct iface_mac * and set m->hardware = true when rte_eth_dev_mac_addr_add succeeds.
    • On hardware add failure due to capacity (ENOSPC) or unsupported (EOPNOTSUPP), code falls back to enabling promiscuous mode (PROMISC_FIXED) rather than failing the control-plane operation.
    • On delete, only removes hardware entry when m->hardware is true. When PROMISC_FIXED is active, attempts to reinstall remaining non-hardware MACs and disables PROMISC_FIXED only if reinstall succeeds.
  • FDB ↔ hardware synchronization

    • Added push_mac_to_hw helper and fdb_event_cb subscribed to FDB add/del events; reacts to GR_EVENT_FDB_ADD/DEL for GR_FDB_F_EXTERN entries (asserts vlan_id == 0) and pushes adds/deletes into member interfaces via iface_add_eth_addr/iface_del_eth_addr.
    • Added fdb_sync_hardware(bridge, member, add) and invoked it on bridge member attach/detach and bridge finalization to sync current FDB state to/from hardware.
  • Bond and VLAN propagation

    • Bond handlers converted to accept struct iface_mac * and propagate m->mac to all member ports (bond no longer stores extra_macs locally).
    • VLAN handlers forward MAC operations to parent via &m->mac and add a .set_promisc callback that delegates to parent iface_set_promisc(parent, enabled).
  • API and CLI

    • Infra API additions: GR_IFACE_MAC_ADD, GR_IFACE_MAC_DEL, GR_IFACE_MAC_LIST (stream), GR_IFACE_MAC_SET with corresponding request structs and registrations.
    • CLI: new mac context with add/del/set commands and show (streams GR_IFACE_MAC_LIST). show renders primary entries (primary=true, refcnt=0) then secondary entries with refcnt.
  • Tests and smoke

    • Replaced port_test.c with iface_test.c exercising the refcounted iface MAC model, PROMISC_FIXED fallback, multicast no-op behavior, NULL input validation, ENOENT semantics, and VLAN ↔ parent synchronization.
    • Added smoke/iface_mac_test.sh exercising grcli flows: add/list/del, duplicate-add refcounting, VLAN MAC propagation, and failure-mode checks.
  • Public API and type signature changes

    • Removed public declarations: port_mac_add(), port_mac_del(), port_promisc_set(); removed struct port_mac, mac_filter_flags_t, and the filter field from iface_info_port.
    • Removed bond->extra_macs field.
    • iface_type callbacks changed:
      • int (*add_eth_addr)(struct iface *, struct iface_mac *);
      • int (*del_eth_addr)(struct iface *, struct iface_mac *);
    • Exported fdb_sync_hardware(const struct iface *bridge, struct iface *member, bool add).

Notes and caveats (from commits/author)

  • Series is draft: not fully re-tested on real hardware.
  • Author suspects remaining bugs in MAC refcount handling; code now treats refcount overflow/underflow as errors.
  • Hardware deletion on mlx5 required an upstream DPDK fix for correct deletions; driver may return ENOTSUP in switchdev mode. FRR full sync remains broken in the author’s environment.

@david-marchand
Copy link
Copy Markdown
Member Author

(just sent a fix as the unit test was missing an update for ASan / leak detection)

@david-marchand
Copy link
Copy Markdown
Member Author

(fixed another stupid bug, introduced by rebase)

Comment thread modules/infra/control/port.c Outdated
Comment thread modules/infra/control/port.c Outdated
m->mac = *mac;
new_m = (struct port_mac) {.refcnt = 1, .mac = *mac};
gr_vec_add(port->filter.macs, new_m);
m = &port->filter.macs[gr_vec_len(port->filter.macs) - 1];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for readability:

m = &gr_vec_last(port->filter.macs);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

../modules/infra/control/port.c: In function ‘port_mac_add’:
../modules/infra/control/port.c:694:13: error: lvalue required as unary ‘&’ operand
694 | m = &gr_vec_last(port->filter.macs);
| ^

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a gr_vec_last_ref.

Comment thread modules/infra/control/port.c Outdated
Comment on lines 730 to 731
for (unsigned i = 0; i < port->filter.count - 1; i++) {
for (unsigned i = 0; i < gr_vec_len(port->filter.macs) - 1; i++) {
m = &port->filter.macs[i];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr_vec_foreach_ref too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last element must be skipped.
So something like:

-               for (unsigned i = 0; i < gr_vec_len(port->filter.macs) - 1; i++) {
-                       m = &port->filter.macs[i];
+               gr_vec_foreach_ref(m, port->filter.macs) {
+                       if (m == gr_vec_last_ref(port->filter.macs))
+                               break;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I had missed that detail. Yes, why not with gr_vec_last_ref

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it was the main reason why I went with writting those loops that way.

I am not sure gr_vec_last_ref() keeps the philosophy of the original gr_vec API.
For example deleting an element requires an index.

But other than that, it is more readable.

Comment thread modules/infra/control/port.c Outdated

// Reinstall all addresses after disabling promisc.
for (i = 0; i < port->filter.count; i++) {
for (i = 0; i < gr_vec_len(port->filter.macs); i++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach_ref

Comment thread modules/infra/control/gr_port.h Outdated
Comment on lines 37 to 39
struct {
mac_filter_flags_t flags;
unsigned hw_limit;
gr_vec struct port_mac *macs;
} filter;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that there is a single gr_vec in this inline struct, we could get rid of it.

Comment thread modules/l2/control/fdb.c Outdated
Comment on lines +345 to +350
LOG(DEBUG,
"failed to %s mac " ETH_F " to %s: %s",
add ? "add" : "del",
mac,
iface->name,
strerror(errno));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not log anything if ret == 0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, the if (ret < 0) got dropped.

@david-marchand
Copy link
Copy Markdown
Member Author

david-marchand commented Mar 31, 2026

Changelog since last push:

  • introduced gr_vec_last_ref() and preferred gr_vec_for_each_ref,
  • flattened iface_info_port structure (no more intermediate filter struct),
  • changed approach of refcount checks and reported over/under flows with an error,
  • added cleanup of mac address in hardware on port deletion, with error logs if refcnt is != 1,
  • fixed bad log in push_mac_to_hw,

@david-marchand
Copy link
Copy Markdown
Member Author

(rebased after gr_vec rename)

@david-marchand
Copy link
Copy Markdown
Member Author

(rebased after API update for event registration)

@david-marchand
Copy link
Copy Markdown
Member Author

(rebased, and refactored the series quite extensively, now mac tracking is done all along a iface chain, and cli commands have been added for both manual setup and debugging)

After refactoring multicast filtering, there is no need for wrapping
some symbols.

Signed-off-by: David Marchand <david.marchand@redhat.com>
refcount integer overflowing would be a nightmare to debug.
Add explicit checks.

Signed-off-by: David Marchand <david.marchand@redhat.com>
The current code around secondary mac addresses is buggy if a hw limit
is reached once, and mac addresses are removed under this limit:
on the first removal, hw_limit is reset to 0 but never updated when
re-adding the mac addresses.

On the other hand, trying to evaluate the hw limit at runtime is tricky,
as the hardware may reply differently depending on global resources
availability.
For example, when using VFs on Intel nics, all VFs shared a maximum
number of 32k switching rules.

For those reasons, prefer tracking if mac addresses got into the
hardware or not.

On deletion, try to add more mac, and disable promisc when all of
them are in hardware.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Move the tests around port mac addresses one layer higher
by calling iface_XX API.
For now, only port type is tested, we can extend those tests with
other iface types later.

Signed-off-by: David Marchand <david.marchand@redhat.com>
This avoids revalidating it in the interface specific ops.

The check for adding multicast mac addresses in the vlan iface code
was left behind from when only multicast addresses mattered.

No code directly calls ->add_eth_addr op, so validate the mac input
parameter at the iface API level.

Besides, checking if the passed mac address is the same as the primary
mac can be moved in a single location.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Only tracking mac addresses at the port level (even with a refcount) is
broken when considering stacked interfaces, and ifaces reconfigurations.

Make the concept generic by introducing a vector of mac addresses at the
iface level.

By moving the logic to iface level, .add_eth_addr/.del_eth_addr are now
only called on the first addition/last removal of a mac address.

We can then remove bond and port specific mac tracking.

And finally as a bonus, this removes the limit to
RTE_ETH_NUM_RECEIVE_MAC_ADDR macs in port ifaces.

Note: RTE_ETH_NUM_RECEIVE_MAC_ADDR is a confusing macro that was
introduced for VMDq support in DPDK, while many nics support more mac
addresses.

Signed-off-by: David Marchand <david.marchand@redhat.com>
@david-marchand david-marchand force-pushed the vf_macs branch 2 times, most recently from f81b3f6 to 6ac7154 Compare April 23, 2026 06:37
@david-marchand
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 93a05813-7ac7-428e-a937-fae1f373bcf3

📥 Commits

Reviewing files that changed from the base of the PR and between 6ac7154 and 715a11b.

📒 Files selected for processing (9)
  • modules/infra/api/gr_infra.h
  • modules/infra/api/iface.c
  • modules/infra/cli/mac.c
  • modules/infra/cli/meson.build
  • modules/infra/control/vlan.c
  • modules/l2/control/bridge.c
  • modules/l2/control/fdb.c
  • modules/l2/control/l2.h
  • smoke/iface_mac_test.sh
✅ Files skipped from review due to trivial changes (1)
  • modules/infra/cli/meson.build
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/l2/control/l2.h
  • modules/l2/control/bridge.c

📝 Walkthrough

Walkthrough

Adds refcounted per-interface MAC management: new struct iface_mac and iface->macs, API requests (add/del/list/set) and CLI mac commands to manage secondary MACs, and streaming list support. Control-plane changes convert MAC add/del hooks to accept struct iface_mac *, update port/bond/vlan logic to propagate/refcount MACs, and remove prior port-level filter APIs. FDB hardware sync for bridges and unit/smoke tests validating refcounting, propagation, and error paths are included.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/control/iface.c (1)

721-727: ⚠️ Potential issue | 🔴 Critical

Unwind tracked MAC filters before freeing iface->macs.

Line 727 frees the refcount table without deleting the tracked filters. For interfaces whose MAC hooks propagate state, such as VLANs propagating to the parent port, destroying the interface can leak parent refcounts and leave stale hardware filters installed.

🐛 Suggested cleanup shape
 	type = iface_type_get(iface->type);
 	assert(type != NULL);
+	while (vec_len(iface->macs) > 0) {
+		struct rte_ether_addr mac = iface->macs[0].mac;
+		uint8_t refcnt = iface->macs[0].refcnt;
+
+		for (uint8_t i = 0; i < refcnt; i++) {
+			if (iface_del_eth_addr(iface, &mac) < 0) {
+				LOG(WARNING, "%s: cleanup mac " ETH_F ": %s",
+				    iface->name, &mac, strerror(errno));
+				break;
+			}
+		}
+	}
 	ret = type->fini(iface);
 	free(iface->name);
 	free(iface->description);
 	vec_free(iface->subinterfaces);
 	vec_free(iface->macs);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/control/iface.c` around lines 721 - 727, Before freeing
iface->macs, unwind any tracked MAC filters to remove propagated refcounts and
hardware filters: locate the teardown sequence around
iface_type_get(...)->fini(iface) and the frees of iface->name,
iface->description and iface->macs, and insert a loop that walks iface->macs and
calls the module's MAC-untrack/unref function for each entry (the same API used
when removing MAC filters elsewhere in the codebase) to decrement parent
refcounts and remove hardware state; only after that call vec_free(iface->macs)
as shown.
🧹 Nitpick comments (1)
modules/infra/api/iface.c (1)

142-185: Consider bypassing the iface walk when iface_id is set.

When req->iface_id != GR_IFACE_ID_UNDEF, the handler still walks every interface and filters inside the loop. A short-circuit via iface_from_id(req->iface_id) would be O(1) and also return ENODEV for a bad id instead of silently streaming nothing. Correctness-wise the current behavior is acceptable, flagging only because it diverges from iface_get/iface_del semantics for unknown ids.

♻️ Sketch
-	while ((iface = iface_next(GR_IFACE_TYPE_UNDEF, iface)) != NULL) {
-		if (req->iface_id != GR_IFACE_ID_UNDEF && iface->id != req->iface_id)
-			continue;
+	if (req->iface_id != GR_IFACE_ID_UNDEF) {
+		iface = iface_from_id(req->iface_id);
+		if (iface == NULL)
+			return api_out(ENODEV, 0, NULL);
+	}
+	do {
+		if (req->iface_id == GR_IFACE_ID_UNDEF
+		    && (iface = iface_next(GR_IFACE_TYPE_UNDEF, iface)) == NULL)
+			break;
 		...
-	}
+	} while (req->iface_id == GR_IFACE_ID_UNDEF);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/api/iface.c` around lines 142 - 185, The handler iface_mac_list
currently walks all interfaces then filters by req->iface_id; change it to
short-circuit when req->iface_id != GR_IFACE_ID_UNDEF by calling
iface_from_id(req->iface_id) and processing only that iface (or set ret = ENODEV
and goto out if iface_from_id returns NULL) instead of using iface_next over all
interfaces; keep the existing logic for sending primary MAC via
iface_get_eth_addr and for iterating vec_foreach_ref over iface->macs and
calling api_send, and in the no-id case preserve the iface_next loop and final
api_out(ret, 0, NULL) behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/infra/control/iface.c`:
- Around line 601-604: The code currently removes the MAC entry from iface->macs
unconditionally after calling type->del_eth_addr(iface, m); change the logic to
only call vec_del(iface->macs, m - iface->macs) when type->del_eth_addr
succeeded (ret == 0); if del_eth_addr fails, preserve the refcounted entry in
iface->macs and return the error (propagate ret) so control-plane state remains
consistent. Use the existing variables type->del_eth_addr, iface->macs, m and
ret to implement this conditional removal.

In `@modules/infra/control/port.c`:
- Around line 687-695: port_mac_del currently swallows errors from
rte_eth_dev_mac_addr_remove which allows the caller to remove MAC tracking even
if the NIC removal failed; modify port_mac_del so that when
rte_eth_dev_mac_addr_remove(port->port_id, &m->mac) returns a negative error
(except -ENOTSUP) it logs as before but then returns that error code to the
caller instead of falling through to return 0. Update the function port_mac_del
(and keep existing check for m->hardware and -ENOTSUP) to propagate the failure
(return ret) so callers that remove the MAC from tracking (caller in iface.c)
can abort on hardware removal failure.

In `@modules/l2/control/fdb.c`:
- Around line 352-373: The fdb_event_cb currently treats GR_EVENT_FDB_DEL as an
add because it calls push_mac_to_hw(member, &fdb->mac, event !=
GR_EVENT_FDB_UPDATE); change the logic so deletes are handled as hardware
removals by passing a remove flag that checks for GR_EVENT_FDB_DEL explicitly
(e.g., replace the third argument with a boolean expression using
GR_EVENT_FDB_DEL or otherwise map
GR_EVENT_FDB_ADD/GR_EVENT_FDB_UPDATE/GR_EVENT_FDB_DEL correctly), updating
fdb_event_cb and the call to push_mac_to_hw to ensure delete events cause
removal rather than addition.

---

Outside diff comments:
In `@modules/infra/control/iface.c`:
- Around line 721-727: Before freeing iface->macs, unwind any tracked MAC
filters to remove propagated refcounts and hardware filters: locate the teardown
sequence around iface_type_get(...)->fini(iface) and the frees of iface->name,
iface->description and iface->macs, and insert a loop that walks iface->macs and
calls the module's MAC-untrack/unref function for each entry (the same API used
when removing MAC filters elsewhere in the codebase) to decrement parent
refcounts and remove hardware state; only after that call vec_free(iface->macs)
as shown.

---

Nitpick comments:
In `@modules/infra/api/iface.c`:
- Around line 142-185: The handler iface_mac_list currently walks all interfaces
then filters by req->iface_id; change it to short-circuit when req->iface_id !=
GR_IFACE_ID_UNDEF by calling iface_from_id(req->iface_id) and processing only
that iface (or set ret = ENODEV and goto out if iface_from_id returns NULL)
instead of using iface_next over all interfaces; keep the existing logic for
sending primary MAC via iface_get_eth_addr and for iterating vec_foreach_ref
over iface->macs and calling api_send, and in the no-id case preserve the
iface_next loop and final api_out(ret, 0, NULL) behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 546fbf28-cbc4-4483-9e75-881082bb9333

📥 Commits

Reviewing files that changed from the base of the PR and between 6d9e2ef and 6ac7154.

📒 Files selected for processing (18)
  • modules/infra/api/gr_infra.h
  • modules/infra/api/iface.c
  • modules/infra/cli/mac.c
  • modules/infra/cli/meson.build
  • modules/infra/control/bond.c
  • modules/infra/control/bond.h
  • modules/infra/control/iface.c
  • modules/infra/control/iface.h
  • modules/infra/control/iface_test.c
  • modules/infra/control/meson.build
  • modules/infra/control/port.c
  • modules/infra/control/port.h
  • modules/infra/control/port_test.c
  • modules/infra/control/vlan.c
  • modules/l2/control/bridge.c
  • modules/l2/control/fdb.c
  • modules/l2/control/l2.h
  • smoke/iface_mac_test.sh
💤 Files with no reviewable changes (3)
  • modules/infra/control/bond.h
  • modules/infra/control/port.h
  • modules/infra/control/port_test.c

Comment on lines +601 to +604
} else {
ret = type->del_eth_addr(iface, m);
vec_del(iface->macs, m - iface->macs);
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep the MAC entry when the delete hook fails.

Line 603 removes the refcounted entry even if type->del_eth_addr() on Line 602 fails. That loses control-plane state while the hardware or parent interface may still have the filter installed.

🐛 Proposed fix
 		} else {
 			ret = type->del_eth_addr(iface, m);
-			vec_del(iface->macs, m - iface->macs);
+			if (ret == 0)
+				vec_del(iface->macs, m - iface->macs);
 		}
 
 		return ret;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/control/iface.c` around lines 601 - 604, The code currently
removes the MAC entry from iface->macs unconditionally after calling
type->del_eth_addr(iface, m); change the logic to only call vec_del(iface->macs,
m - iface->macs) when type->del_eth_addr succeeded (ret == 0); if del_eth_addr
fails, preserve the refcounted entry in iface->macs and return the error
(propagate ret) so control-plane state remains consistent. Use the existing
variables type->del_eth_addr, iface->macs, m and ret to implement this
conditional removal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much we can do at this point if the parent iface refused the removal.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-marchand, understood — if the underlying layer rejects the removal, the MAC state is already in an unrecoverable situation and there's no safe way to reconcile it. Fair enough, I'll accept this as intentional.


🧠 Learnings used
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Learnt from: vjardin
Repo: DPDK/grout PR: 565
File: modules/ipip/datapath_out.c:82-86
Timestamp: 2026-03-23T23:58:22.163Z
Learning: In the grout codebase, `ip_output` and `iface_output` maintain separate iface-type dispatch tables. `ip_output_register_interface_type` registers next nodes for L3 dispatch (e.g., ipip_output for GR_IFACE_TYPE_IPIP), while `iface_output_type_register` registers L2/physical next nodes (e.g., port_output for GR_IFACE_TYPE_PORT). IPIP egress packets flow ip_output → ipip_output directly, bypassing iface_output for the IPIP logical interface. iface_output only sees the outer/physical iface for the encapsulated packet. Therefore capture_enqueue calls in ipip_output (for the IPIP logical iface) and in iface_output (for the physical iface) are NOT duplicates — they capture at different layers. Do not flag these as duplicate captures.

Learnt from: rjarry
Repo: DPDK/grout PR: 553
File: modules/infra/control/loopback.c:154-165
Timestamp: 2026-03-16T08:03:05.858Z
Learning: In the grout codebase, the `tun_pi` header prepended by the TUN device is NOT stripped in `modules/infra/control/loopback.c`. It is stripped in the datapath node `loopback_input_process` in `modules/infra/datapath/loop_input.c` via `rte_pktmbuf_adj(mbuf, sizeof(struct tun_pi))`, before the packet is dispatched to the L3 stack. Do not flag missing tun_pi stripping in loopback.c.

Learnt from: rjarry
Repo: DPDK/grout PR: 504
File: modules/l2/control/bridge.c:60-77
Timestamp: 2026-02-14T09:31:42.766Z
Learning: In the DPDK/grout codebase, `bridge_detach_member` (modules/l2/control/bridge.c) is only called during interface reconfiguration when an interface changes mode or domain, and it is always followed by another attach or VRF assignment. Therefore, it does not need to restore `vrf_id` via `vrf_default_get_or_create()`. Only `bridge_fini` (during bridge deletion) needs to explicitly restore the default VRF, as there is no subsequent attach operation.

Learnt from: rjarry
Repo: DPDK/grout PR: 466
File: modules/ip6/control/icmp6.c:173-178
Timestamp: 2025-12-18T16:04:37.486Z
Learning: In modules/ip/control/icmp.c and modules/ip6/control/icmp6.c, the mbufs stored in icmp_queue are owned by other pktmbuf pools that are freed elsewhere. It is not necessary to free these mbufs in icmp_fini/icmp6_fini before calling rte_mempool_free on the icmp_queue_item mempool.

Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:154-155
Timestamp: 2026-03-16T15:27:46.743Z
Learning: In the grout codebase, assume all mbufs are mono-segment. Therefore rte_pktmbuf_data_len(m) is equivalent to rte_pktmbuf_pkt_len(m); multi-segment mbuf concerns do not apply. Do not flag multi-segment handling issues in grout code. This guidance applies across grout C sources under modules/, and review checks should not raise multi-segment related issues for such files unless there is explicit evidence of multi-segment usage in a specific case.

Comment on lines +687 to 695
static int port_mac_del(struct iface *iface, struct iface_mac *m) {
struct iface_info_port *port = iface_info_port(iface);
struct port_mac *m;
uint8_t i;
int ret;

if (mac == NULL)
return errno_set(EINVAL);
if (rte_is_multicast_ether_addr(mac))
return 0; // ALLMULTI is always on

if (rte_is_same_ether_addr(mac, &port->mac))
return 0;

for (i = 0; i < port->filter.count; i++) {
m = &port->filter.macs[i];
if (rte_is_same_ether_addr(&m->mac, mac))
goto found;
}
return errno_set(ENOENT);

found:
if (--m->refcnt > 0) {
LOG(DEBUG,
"%s: mac " ETH_F " still filtered (refs=%u)",
iface->name,
mac,
m->refcnt);
return 0;
if (m->hardware) {
ret = rte_eth_dev_mac_addr_remove(port->port_id, &m->mac);
if (ret < 0 && ret != -ENOTSUP)
LOG(WARNING, "%s: %s", iface->name, rte_strerror(-ret));
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect the MAC delete hook call path and confirm whether state is removed after hook success.
rg -n -C4 'type->del_eth_addr\(iface, m\)|\.del_eth_addr = port_mac_del|rte_eth_dev_mac_addr_remove'

Repository: DPDK/grout

Length of output: 4095


🏁 Script executed:

cat -n modules/infra/control/port.c | sed -n '687,710p'

Repository: DPDK/grout

Length of output: 929


🏁 Script executed:

cat -n modules/infra/control/iface.c | sed -n '595,610p'

Repository: DPDK/grout

Length of output: 439


🏁 Script executed:

rg -n 'errno_log|^int errno_log|^void errno_log' --type c

Repository: DPDK/grout

Length of output: 5661


🏁 Script executed:

rg -n -A5 -B5 'will_return.*rte_eth_dev_mac_addr_remove.*-' modules/infra/control/iface_test.c

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

cat -n modules/infra/control/port.c | sed -n '687,730p'

Repository: DPDK/grout

Length of output: 1551


Propagate hardware MAC removal failures instead of silently continuing.

The function always returns 0 (line 726) even when rte_eth_dev_mac_addr_remove() fails. The caller immediately removes the MAC from tracking (iface.c:603) regardless of the hook's return value, leaving the NIC potentially still accepting traffic on that MAC while the app believes it's deleted.

🐛 Proposed fix
 	if (m->hardware) {
 		ret = rte_eth_dev_mac_addr_remove(port->port_id, &m->mac);
-		if (ret < 0 && ret != -ENOTSUP)
-			LOG(WARNING, "%s: %s", iface->name, rte_strerror(-ret));
+		if (ret < 0)
+			return errno_log(-ret, "rte_eth_dev_mac_addr_remove");
+		m->hardware = false;
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/control/port.c` around lines 687 - 695, port_mac_del currently
swallows errors from rte_eth_dev_mac_addr_remove which allows the caller to
remove MAC tracking even if the NIC removal failed; modify port_mac_del so that
when rte_eth_dev_mac_addr_remove(port->port_id, &m->mac) returns a negative
error (except -ENOTSUP) it logs as before but then returns that error code to
the caller instead of falling through to return 0. Update the function
port_mac_del (and keep existing check for m->hardware and -ENOTSUP) to propagate
the failure (return ret) so callers that remove the MAC from tracking (caller in
iface.c) can abort on hardware removal failure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before. I don't see what to do if removal failed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-marchand, that's a fair point — there's no sensible recovery path when rte_eth_dev_mac_addr_remove fails (the NIC is in an unknown state regardless), so forcing the caller to abort the tracking removal would only make things worse by leaving software state inconsistent with whatever hardware state actually is. Logging a warning and continuing is the right call here.

I'll drop this comment.


✏️ Learnings added
Learnt from: david-marchand
Repo: DPDK/grout PR: 579
File: modules/infra/control/port.c:687-695
Timestamp: 2026-04-23T08:46:26.517Z
Learning: In `modules/infra/control/port.c`, `port_mac_del` intentionally ignores errors from `rte_eth_dev_mac_addr_remove` (only logging a WARNING) because there is no meaningful recovery when hardware MAC filter removal fails — the MAC must be removed from software tracking regardless to avoid a worse inconsistency. Do not flag this as a missing error propagation issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: rjarry
Repo: DPDK/grout PR: 553
File: modules/infra/control/loopback.c:154-165
Timestamp: 2026-03-16T08:03:05.858Z
Learning: In the grout codebase, the `tun_pi` header prepended by the TUN device is NOT stripped in `modules/infra/control/loopback.c`. It is stripped in the datapath node `loopback_input_process` in `modules/infra/datapath/loop_input.c` via `rte_pktmbuf_adj(mbuf, sizeof(struct tun_pi))`, before the packet is dispatched to the L3 stack. Do not flag missing tun_pi stripping in loopback.c.

Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:156-160
Timestamp: 2026-03-17T08:11:06.305Z
Learning: In `modules/infra/datapath/port_rx.c`, the bounds check `rte_pktmbuf_data_len(m) >= csum_offset + 2` in `fix_l4_csum()` can never fail for virtio because the DPDK virtio driver guarantees all headers (including the L4 checksum field) are stored in the first segment/descriptor. Do not flag this guard as an issue in virtio RX paths.

Learnt from: rjarry
Repo: DPDK/grout PR: 504
File: modules/l2/control/bridge.c:60-77
Timestamp: 2026-02-14T09:31:42.766Z
Learning: In the DPDK/grout codebase, `bridge_detach_member` (modules/l2/control/bridge.c) is only called during interface reconfiguration when an interface changes mode or domain, and it is always followed by another attach or VRF assignment. Therefore, it does not need to restore `vrf_id` via `vrf_default_get_or_create()`. Only `bridge_fini` (during bridge deletion) needs to explicitly restore the default VRF, as there is no subsequent attach operation.

Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:154-155
Timestamp: 2026-03-16T15:27:46.743Z
Learning: In the grout codebase, assume all mbufs are mono-segment. Therefore rte_pktmbuf_data_len(m) is equivalent to rte_pktmbuf_pkt_len(m); multi-segment mbuf concerns do not apply. Do not flag multi-segment handling issues in grout code. This guidance applies across grout C sources under modules/, and review checks should not raise multi-segment related issues for such files unless there is explicit evidence of multi-segment usage in a specific case.

Comment thread modules/l2/control/fdb.c
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Collaborator

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I just wonder if we should make the fdb_sync_hardware() call effective only on ports that do not support real promisc.

Is this still a draft?

Comment thread modules/infra/api/iface.c Outdated
// Send primary MAC address first
struct rte_ether_addr primary_mac;
if (iface_get_eth_addr(iface, &primary_mac) == 0) {
struct gr_iface_mac *mac = malloc(sizeof(*mac));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to malloc here. You can use a stack variable.

Comment thread modules/infra/api/iface.c Outdated

// Send secondary MAC addresses
vec_foreach_ref (struct iface_mac *m, iface->macs) {
struct gr_iface_mac *mac = malloc(sizeof(*mac));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same stack variable here.

Introduce a new top-level "mac" CLI context for managing both primary
and secondary MAC addresses on interfaces.

The primary MAC address can be changed with "mac set MAC iface IFACE".
Secondary MAC addresses can be added and removed with "mac add" and
"mac del" commands.

The "mac show" command displays all MAC addresses (both primary and
secondary) with their reference counts. The primary MAC is always shown
first with a "-" in the REFCNT column to distinguish it from secondary
addresses.

This provides a unified interface for MAC address operations instead of
having them scattered under different command contexts.
This also helps a lot when troubleshooting Rx filters and other fun
jokes about mac addresses.

Signed-off-by: David Marchand <david.marchand@redhat.com>
When adding a vlan interface in a bridge, pass on the promiscuous
configuration to the parent port.

Signed-off-by: David Marchand <david.marchand@redhat.com>
In setups where grout physical port is a VF, traffic sent by other
VFs reach grout only if the destination address is multicast or a mac
address for which a rx filter was declared in hw.

Make use of control path pushed FDB entries to populate hardware and
let the packets flow.

Signed-off-by: David Marchand <david.marchand@redhat.com>
@david-marchand
Copy link
Copy Markdown
Member Author

This looks good to me. I just wonder if we should make the fdb_sync_hardware() call effective only on ports that do not support real promisc.

The Rx filter approach should work in all cases.

Adding a global knob could be done, but I don't see how we could discover such "real" promisc capability.

Is this still a draft?

Unfortunately yes, I still want to check with mlx5 in both legacy and switchdev modes.
I hope I can finish this week.

@david-marchand
Copy link
Copy Markdown
Member Author

I double checked the mac workaround with mlx5 manually.

Note: it requires dpdk fix c62ff2e6d48b ("common/mlx5: fix MAC deletion on Linux") for deletions to work.
This fix is in v25.11.1.

As for the full switchdev setup, I am busy on other activities, but it should work as is (I expect a ENOTSUP coming from the mlx5 driver, in such a configuration).

@david-marchand david-marchand marked this pull request as ready for review April 29, 2026 13:49
@david-marchand
Copy link
Copy Markdown
Member Author

frr full synchronisation is still broken in my env, and I am not sure what is wrong...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants