Skip to content

Add L3VPN (type-5) support#566

Open
rjarry wants to merge 8 commits intoDPDK:mainfrom
rjarry:vxlan-l3vpn
Open

Add L3VPN (type-5) support#566
rjarry wants to merge 8 commits intoDPDK:mainfrom
rjarry:vxlan-l3vpn

Conversation

@rjarry
Copy link
Copy Markdown
Collaborator

@rjarry rjarry commented Mar 22, 2026

EVPN type-5 (IP prefix) routes allow routing traffic across a VXLAN overlay using symmetric IRB with a per-VRF L3 VNI. This adds support for L3VPN in grout without requiring the Linux bridge+SVI model -- VXLAN interfaces in VRF mode work directly.

VRF interfaces now get a random MAC at creation time, mirrored on the Linux VRF netdevice. When a VXLAN is in VRF mode, it inherits the VRF's MAC so that it matches the Router MAC advertised in EVPN type-5 routes. The FRR plugin presents VRF-mode VXLANs as bridge slaves of the VRF interface so that FRR's L3VNI logic recognizes the SVI and computes the Router MAC.

The FRR plugin handles DPLANE_OP_NEIGH_INSTALL to cache remote router MACs (RMACs) delivered by FRR before the corresponding nexthop install. When grout_add_nexthop processes an L3 nexthop in a VRF with an L3VNI, it redirects the interface from the VRF to the VXLAN and applies the cached RMAC. Both mappings are maintained in a dedicated l3vni_map module on the dplane thread without locking.

The mbuf private data overlap between eth_output_mbuf_data and iface_mbuf_data is fixed by adding a vtep field that ip_output sets from the nexthop gateway IP and eth_output preserves across the memory region transition.

Technical Changes

L3VNI State Management

  • Added dplane-thread-only module frr/l3vni_map implementing two lock-free hash maps:
    • VRF→VXLAN mapping: l3vni_set(vrf_id, vxlan_iface_id), l3vni_del(vrf_id), l3vni_get_vxlan(vrf_id) (returns GR_IFACE_ID_UNDEF when absent).
    • (VRF, VTEP)→RMAC cache: l3vni_rmac_set(vrf_id, vtep, mac), l3vni_rmac_del(vrf_id, vtep), l3vni_rmac_get(vrf_id, vtep) (returns NULL when absent).
  • Per-operation heap allocations and frees; operations assume exclusive dataplane-thread execution (no locks).

VRF Router MAC Handling and FRR L3VNI Integration

  • Introduced struct rte_ether_addr mac in struct gr_iface_info_vrf; VRF MAC randomized at creation (rte_eth_random_addr) and applied to kernel VRF netdevice (netlink_link_set_mac).
  • VXLAN interfaces in VRF mode inherit the VRF MAC when their explicit MAC is zero; iface_vxlan_reconfig computes/applies the effective RMAC and synchronizes it to the encapsulation VRF when applicable.
  • frr/if_grout.c reports VRF-mode VXLANs to FRR as bridge slaves of the VRF-derived interface by setting bridge_ifindex and slave_type and calls l3vni_set(vrf_id, vxlan_iface_id); on removal calls l3vni_del(vrf_id). This surfaces the VRF interface MAC as the Router MAC FRR expects for L3VNI handling.

Neighbor (RMAC) Handling in DPlane

  • Added grout_neigh_update_ctx(struct zebra_dplane_ctx *ctx) and wired handlers for DPLANE_OP_NEIGH_INSTALL/UPDATE/DELETE in zebra_dplane_grout.c.
  • grout_neigh_update_ctx processes only IPv4 neighbor VTEP addresses: on add stores RMAC via l3vni_rmac_set(vrf_id, vtep_ipv4, mac); on delete removes via l3vni_rmac_del(vrf_id, vtep_ipv4). Handler does not emit grout requests and always returns success.

Nexthop Redirection and RMAC Application

  • grout_add_nexthop() consults l3vni_get_vxlan(vrf_id); when present, rewrites nexthop.iface_id to the mapped VXLAN interface.
  • For redirected IPv4 nexthops, grout_add_nexthop() queries l3vni_rmac_get(vrf_id, vtep_ip); if found, copies the RMAC into the nexthop mac and sets GR_NH_F_REMOTE to mark a remote (no-ARP/ND) nexthop.
  • Added GR_NH_F_REMOTE flag and string mapping in gr_nexthop.h.

mbuf Private-data, vtep Field, and VXLAN Output Coupling

  • Extended eth_output_mbuf_data to include ip4_addr_t vtep.
  • ip_output_process() and ip6_output_process() set eth_data->vtep to the selected L3 gateway IPv4 when output iface is GR_IFACE_TYPE_VXLAN; otherwise zero.
  • eth_output_process() copies priv->vtep into iface_mbuf_data(mbuf)->vtep for each mbuf so vxlan_output can read the VTEP IP for RMAC lookup/encapsulation.

FRR Plugin and DPlane Routing Changes

  • frr/rt_grout.c includes l3vni_map.h, uses l3vni_get_vxlan in grout_add_nexthop for L3→L3VNI redirection and RMAC application, and implements grout_neigh_update_ctx() for neighbor RMAC caching.
  • zebra_dplane_grout.c dispatches neighbor install/update/delete ops to grout_neigh_update_ctx().

Control-plane Display and Configuration

  • VRF CLI and infra expose a mac attribute, parse/set it, and include it in VRF show output; iface_vrf_reconfig handles GR_VRF_SET_MAC.
  • iface_vrf_init initializes a randomized VRF MAC; iface_vrf_{get,set}_eth_addr accessors and VRF interface type callbacks added.

Miscellaneous

  • frr/meson.build updated to compile l3vni_map.c into the dplane_grout shared module.
  • Added smoke test smoke/evpn_l3vpn_frr_test.sh exercising EVPN type-5 L3VPN over VXLAN (L3VNI discovery, type-5 route exchange, RMAC presence on nexthops, route installation, and ICMP overlay reachability).
  • ip6_iface_llocal_init now returns early for VRF interfaces to avoid IPv6 link-local assignment on VRFs.
  • l4_input_local_process made protocol-aware and added per-mbuf L4 trace data and formatter (separate functional area).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds thread-local L3VNI and RMAC caches via new files frr/l3vni_map.h and frr/l3vni_map.c with public accessors. Integrates these caches into grout and routing: grout_link_change() tracks L3VNI mappings and VRF MACs for VRF-mode VXLAN, grout_add_nexthop() routes L3 nexthops via mapped VXLAN ifaces and applies cached RMACs, and a new grout_neigh_update_ctx() updates RMAC cache from neighbor events. Datapath mbuf metadata gains a vtep field; VRFs get randomized and reconfigurable MACs. A smoke test for EVPN L3VPN over VXLAN was added.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/infra/control/vrf.c`:
- Line 151: The call to netlink_link_set_mac(vrf->vrf_ifindex, &vrf->mac)
ignores its return value; add proper error handling by checking the returned
errno-like value and logging and handling failure similarly to the pattern used
in ctlplane.c (e.g., check the int rc, call vlog_err or processLogger equivalent
with a descriptive message including vrf->name or vrf->vrf_ifindex and
strerror(rc), and decide whether to continue or return an error from the
enclosing function). Locate netlink_link_set_mac usage in
modules/infra/control/vrf.c and follow the same error-check/log-and-fail
semantics as the netlink link operations in ctlplane.c to ensure MAC set
failures are surfaced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 089de96b-5cf7-472c-aa7f-7f4f340041b8

📥 Commits

Reviewing files that changed from the base of the PR and between 0f7b91c and ffcd5e8.

📒 Files selected for processing (18)
  • frr/if_grout.c
  • frr/l3vni_map.c
  • frr/l3vni_map.h
  • frr/meson.build
  • frr/rt_grout.c
  • frr/rt_grout.h
  • frr/zebra_dplane_grout.c
  • modules/infra/api/gr_infra.h
  • modules/infra/api/gr_nexthop.h
  • modules/infra/cli/vrf.c
  • modules/infra/control/vrf.c
  • modules/infra/datapath/eth_output.c
  • modules/infra/datapath/gr_eth.h
  • modules/ip/datapath/ip_output.c
  • modules/ip6/datapath/ip6_output.c
  • modules/l2/control/vxlan.c
  • modules/l4/l4_input_local.c
  • smoke/evpn_l3vpn_frr_test.sh

strerror(errno));
return ret;
}
netlink_link_set_mac(vrf->vrf_ifindex, &vrf->mac);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error check on netlink_link_set_mac.

The return value is ignored. If this fails, the kernel VRF device won't have the expected MAC, potentially causing EVPN router MAC mismatches. See modules/infra/control/ctlplane.c for the proper pattern.

Proposed fix
-		netlink_link_set_mac(vrf->vrf_ifindex, &vrf->mac);
+		if (netlink_link_set_mac(vrf->vrf_ifindex, &vrf->mac) < 0)
+			LOG(WARNING,
+			    "netlink_link_set_mac(%s): %s",
+			    iface->name,
+			    strerror(errno));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/control/vrf.c` at line 151, The call to
netlink_link_set_mac(vrf->vrf_ifindex, &vrf->mac) ignores its return value; add
proper error handling by checking the returned errno-like value and logging and
handling failure similarly to the pattern used in ctlplane.c (e.g., check the
int rc, call vlog_err or processLogger equivalent with a descriptive message
including vrf->name or vrf->vrf_ifindex and strerror(rc), and decide whether to
continue or return an error from the enclosing function). Locate
netlink_link_set_mac usage in modules/infra/control/vrf.c and follow the same
error-check/log-and-fail semantics as the netlink link operations in ctlplane.c
to ensure MAC set failures are surfaced.

@chdxD1
Copy link
Copy Markdown

chdxD1 commented Mar 25, 2026

I am watching this project now for some time, thanks to Federico.

Thanks for your work on getting EVPN done here, that's much appreciated.

I was just reading through the MR text (not the code!)

VRF interfaces now get a random MAC at creation time, mirrored on the Linux VRF netdevice.

From my experience you want a system mac that is unique for the VTEP, not for each VRF. While technically possible from a protocol perspective, I've seen vendor implementations having the assumption of a VTEP 1:1 RMAC mapping.

@rjarry
Copy link
Copy Markdown
Collaborator Author

rjarry commented Mar 25, 2026

Hey Chris,

Thanks for the comment, that's great to get outside perspective.

From my experience you want a system mac that is unique for the VTEP, not for each VRF. While technically possible from a protocol perspective, I've seen vendor implementations having the assumption of a VTEP 1:1 RMAC mapping.

Do you think that will be an issue from an interoperability standpoint? I was trying to stick to Linux implementation where the RMAC is the bridge mac address where the VXLAN interface is enslaved.

We could think of a way of defining a "global" RMAC based on the encap VRF interface (i.e. the VRF where the VTEP IP address is defined). But that would require some tweaking to ensure packets can be received by the VXLAN interface.

@chdxD1
Copy link
Copy Markdown

chdxD1 commented Mar 25, 2026

Do you think that will be an issue from an interoperability standpoint? I was trying to stick to Linux implementation where the RMAC is the bridge mac address where the VXLAN interface is enslaved.

We could think of a way of defining a "global" RMAC based on the encap VRF interface (i.e. the VRF where the VTEP IP address is defined). But that would require some tweaking to ensure packets can be received by the VXLAN interface.

I detailed some of my points here FRRouting/frr#18190, but in general we've seen interoperability issues with SONiC. (which however might have been due to the FRR bug mentioned in the issue). Since then we always generated a single mac per system and use that for all L3VNI bridges. This is also what we see when we learn routes from other devices in the fabric (like Cisco, Juniper, ...)

@rjarry
Copy link
Copy Markdown
Collaborator Author

rjarry commented Mar 25, 2026

I can see an option to move forward. Instead of inheriting from the MAC address of the VRF interface associated with the VXLAN port, how about inheriting from the MAC address of the encap VRF interface (the underlay)?

Here's a visual of what I am thinking:

flowchart TB
    subgraph vrf-underlay1
    lound1["VTEP\n100.1.2.3\nRMAC1"]
    end
    subgraph vrf-underlay2
    lound2["VTEP\n200.3.2.1\nRMAC2"]
    end
    subgraph vrf-red
    vxlan100["vxlan100\nRMAC1"] -.- lound1
    end
    subgraph vrf-blue
    vxlan200["vxlan200\nRMAC1"] -.- lound1
    end
    subgraph vrf-green
    vxlan300["vxlan300\nRMAC2"] -.- lound2
    end
Loading

NB: this only applies to grout which does not require VXLAN interfaces to be artificially put into bridges when they are configured in L3 mode.

@rjarry
Copy link
Copy Markdown
Collaborator Author

rjarry commented Mar 25, 2026

Replying to my own comment. This proposal may not work with FRR which requires that the RMAC is the mac address of the bridge where the VXLAN interface is enslaved. Since grout does not need any bridge for L3VPN, we're talking about the VRF where the VXLAN interface is. So, the updated diagram would look like:

flowchart TB
    subgraph underlay1["VRF underlay1 -- RMAC1"]
    lound1["VTEP\n100.1.2.3"]
    end
    subgraph underlay2["VRF underlay2 -- RMAC2"]
    lound2["VTEP\n200.3.2.1"]
    end
    subgraph red["VRF red -- RMAC1"]
    vxlan100["vxlan100\nRMAC1"] -.- lound1
    end
    subgraph blue["VRF blue -- RMAC1"]
    vxlan200["vxlan200\nRMAC1"] -.- lound1
    end
    subgraph green["VRF green -- RMAC2"]
    vxlan300["vxlan300\nRMAC2"] -.- lound2
    end
Loading

@rjarry rjarry force-pushed the vxlan-l3vpn branch 3 times, most recently from 6a3eff3 to 4ead515 Compare March 31, 2026 07:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ef3461a-2d11-4dfc-9b18-d0ed48a40018

📥 Commits

Reviewing files that changed from the base of the PR and between bd7e9cd and 6a3eff3.

📒 Files selected for processing (14)
  • frr/if_grout.c
  • frr/l3vni_map.c
  • frr/l3vni_map.h
  • frr/meson.build
  • frr/rt_grout.c
  • frr/rt_grout.h
  • frr/zebra_dplane_grout.c
  • modules/infra/api/gr_nexthop.h
  • modules/infra/datapath/eth_output.c
  • modules/infra/datapath/gr_eth.h
  • modules/ip/datapath/ip_output.c
  • modules/ip6/datapath/ip6_output.c
  • modules/l2/control/vxlan.c
  • smoke/evpn_l3vpn_frr_test.sh
✅ Files skipped from review due to trivial changes (4)
  • frr/zebra_dplane_grout.c
  • frr/rt_grout.h
  • frr/l3vni_map.h
  • frr/meson.build
🚧 Files skipped from review as they are similar to previous changes (6)
  • modules/ip6/datapath/ip6_output.c
  • modules/infra/datapath/eth_output.c
  • modules/infra/api/gr_nexthop.h
  • modules/ip/datapath/ip_output.c
  • modules/l2/control/vxlan.c
  • frr/l3vni_map.c

Comment thread frr/rt_grout.c
Comment thread frr/rt_grout.c
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/infra/cli/vrf.c`:
- Around line 84-85: The CLI is setting the wrong attribute bit: replace
GR_VLAN_SET_MAC with GR_VRF_SET_MAC in the arg_eth_addr handling (the code that
currently calls arg_eth_addr(p, "MAC", &info->mac) and then sets set_attrs |=
GR_VLAN_SET_MAC) so it matches the reconfig handler iface_vrf_reconfig which
checks GR_VRF_SET_MAC; also update the validation mask (the mask near line 62)
to include GR_VRF_SET_MAC so the attribute is allowed during validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8332c552-581d-48cc-8fe8-3a534b15b882

📥 Commits

Reviewing files that changed from the base of the PR and between 6a3eff3 and 4ead515.

📒 Files selected for processing (18)
  • frr/if_grout.c
  • frr/l3vni_map.c
  • frr/l3vni_map.h
  • frr/meson.build
  • frr/rt_grout.c
  • frr/rt_grout.h
  • frr/zebra_dplane_grout.c
  • modules/infra/api/gr_infra.h
  • modules/infra/api/gr_nexthop.h
  • modules/infra/cli/vrf.c
  • modules/infra/control/vrf.c
  • modules/infra/datapath/eth_output.c
  • modules/infra/datapath/gr_eth.h
  • modules/ip/datapath/ip_output.c
  • modules/ip6/control/address.c
  • modules/ip6/datapath/ip6_output.c
  • modules/l2/control/vxlan.c
  • smoke/evpn_l3vpn_frr_test.sh
✅ Files skipped from review due to trivial changes (5)
  • frr/rt_grout.h
  • modules/infra/api/gr_nexthop.h
  • frr/meson.build
  • modules/l2/control/vxlan.c
  • frr/l3vni_map.h
🚧 Files skipped from review as they are similar to previous changes (8)
  • modules/infra/datapath/eth_output.c
  • modules/infra/api/gr_infra.h
  • frr/zebra_dplane_grout.c
  • modules/infra/datapath/gr_eth.h
  • modules/ip/datapath/ip_output.c
  • frr/if_grout.c
  • frr/rt_grout.c
  • modules/infra/control/vrf.c

Comment thread modules/infra/cli/vrf.c
@rjarry rjarry force-pushed the vxlan-l3vpn branch 7 times, most recently from 5ad84d9 to 9cb705c Compare April 5, 2026 21:11
@rjarry rjarry mentioned this pull request Apr 5, 2026
@rjarry rjarry changed the title Add EVPN L3VPN (type-5) support Add L3VPN (type-5) support Apr 16, 2026
@rjarry rjarry force-pushed the vxlan-l3vpn branch 2 times, most recently from 62c039c to 1d10df4 Compare April 20, 2026 18:41
rjarry added 5 commits April 21, 2026 15:25
Add trace output showing source and destination ports for UDP and TCP
packets in l4_input_local. Also parse TCP headers to extract ports
before falling through to management plane forwarding.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
VRF interfaces now get a random MAC address at creation time. This MAC
will be used as the Router MAC for EVPN L3VNI type-5 routes.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
VXLAN interfaces now inherit from their VTEP VRF (encap_vrf_id) MAC
address instead of a random one. Some EVPN endpoints make the assumption
that there is a unique RMAC per VTEP.

When a VXLAN interface is in VRF mode (or moves to one), also
synchronize that MAC address to the VRF interface. This ensures the
VXLAN interface's MAC matches the Router MAC that FRR advertises in EVPN
type-5 routes, so incoming L3VPN packets pass eth_input's check.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
All GR_MBUF_PRIV_DATA_TYPE types overlay the same memory region. When
ip_output resolves a nexthop on a VXLAN interface, it writes
eth_output_mbuf_data->dst and ->ether_type which clobber
iface_mbuf_data->vtep. By the time vxlan_output reads the vtep field,
it contains garbage.

Add a vtep field to eth_output_mbuf_data and set it from the nexthop
gateway IP when the output interface is a VXLAN. In eth_output, save
the vtep value before writing iface_mbuf_data (which shares the same
memory) and restore it after.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add a flag to mark nexthops learned from remote VTEPs via EVPN. These
nexthops carry a known IP+MAC pair from the control plane and are set
to GR_NH_S_REACHABLE with GR_NH_F_STATIC so they bypass ARP/ND
probing and aging.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
rjarry added 3 commits April 21, 2026 15:25
EVPN type-5 routes require two pieces of state that FRR delivers out
of order: the remote router MAC arrives via DPLANE_OP_NEIGH_INSTALL
before the nexthop via DPLANE_OP_NH_INSTALL. Both need to be combined
when grout_add_nexthop builds the GR_NH_ADD request.

Add l3vni_map to maintain two hash tables on the dplane thread:

  vrf_id -> vxlan_iface_id:  used to redirect nexthops from the VRF
  interface (FRR's SVI model) to the VXLAN interface so that ip_output
  routes packets into the tunnel.

  (vrf_id, vtep) -> RMAC:  caches the remote router MAC until the
  matching nexthop install arrives.

Both tables run exclusively on the dplane thread so no locking is
needed.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Handle EVPN type-5 (IP prefix) routes with symmetric IRB over VXLAN.

Present VRF-mode VXLAN interfaces to FRR as bridge slaves of the VRF
interface. FRR requires an SVI derived from a bridge master to bring the
L3VNI up and compute the Router MAC for type-5 routes. The VRF MAC (set
in a previous commit) serves as the Router MAC.

Handle DPLANE_OP_NEIGH_INSTALL/DELETE to cache remote router MACs
delivered by FRR before the corresponding nexthop install arrives. When
grout_add_nexthop processes an L3 nexthop in a VRF with an L3VNI, it
redirects the interface from VRF to VXLAN and applies the cached RMAC.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Verify EVPN type-5 IP prefix route exchange and L3 connectivity over
VXLAN between FRR+grout and a standalone FRR+Linux peer. Each side
has a VRF with an L3 VNI (1000) and a host on a local subnet. BGP
EVPN advertises connected prefixes as type-5 routes.

Grout uses a VXLAN in VRF mode (no bridge needed). The peer uses the
standard Linux bridge+SVI model. The test verifies L3VNI recognition,
type-5 route exchange, route installation in the VRF, RMAC presence
on the route nexthop, and end-to-end ping through the overlay.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants