Skip to content

Conversation

@Josue-T
Copy link

@Josue-T Josue-T commented Jun 19, 2023

Problème

On my side the kernel 6.1 crash with this following stacktrace:

[   17.002328] ------------[ cut here ]------------
[   17.006989] kernel BUG at drivers/regulator/core.c:424!
[   17.012235] Internal error: Oops - BUG: 0 [#1] SMP ARM
[   17.017391] Modules linked in: lima drm_shmem_helper gpu_sched spi_mt65xx pwm_mediatek sha1_arm sha256_arm sha512_arm aes_arm nf_conntrack_ftp ip6table_filter ip6_tables xt_state nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse configfs ip_tables x_tables
[   17.040022] CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 6.1.22-bpi-r2-main #1
[   17.047266] Hardware name: Mediatek Cortex-A7 (Device Tree)
[   17.052853] Workqueue: events dbs_work_handler
[   17.057336] PC is at regulator_check_voltage+0xa8/0xec
[   17.062504] LR is at 0x118c30
[   17.065486] pc : [<c066bae0>]    lr : [<00118c30>]    psr: 200e0013
[   17.071763] sp : e087de00  ip : c19c4800  fp : c27d88b4
[   17.077007] r10: c1b15ec0  r9 : c2cb5a00  r8 : 00118c30
[   17.082268] r7 : c2606c00  r6 : 0013d620  r5 : 00000000  r4 : c2606c00
[   17.088818] r3 : c0670014  r2 : e087de18  r1 : e087de1c  r0 : c2606c00
[   17.095391] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   17.102610] Control: 10c5387d  Table: 886c806a  DAC: 00000051
[   17.108390] Register r0 information: slab kmalloc-1k start c2606c00 pointer offset 0 size 1024
[   17.117076] Register r1 information: 2-page vmalloc region starting at 0xe087c000 allocated at copy_process+0x1b8/0x10ec
[   17.128028] Register r2 information: 2-page vmalloc region starting at 0xe087c000 allocated at copy_process+0x1b8/0x10ec
[   17.138982] Register r3 information: non-slab/vmalloc memory
[   17.144689] Register r4 information: slab kmalloc-1k start c2606c00 pointer offset 0 size 1024
[   17.153355] Register r5 information: NULL pointer
[   17.158109] Register r6 information: non-paged memory
[   17.163206] Register r7 information: slab kmalloc-1k start c2606c00 pointer offset 0 size 1024
[   17.171873] Register r8 information: non-paged memory
[   17.176980] Register r9 information: slab kmalloc-64 start c2cb5a00 pointer offset 0 size 64
[   17.185487] Register r10 information: slab kmalloc-64 start c1b15ec0 pointer offset 0 size 64
[   17.194059] Register r11 information: slab kmalloc-192 start c27d8840 pointer offset 116 size 192
[   17.203003] Register r12 information: slab task_struct start c19c4800 pointer offset 0
[   17.210954] Process kworker/1:0 (pid: 20, stack limit = 0xb67caf0a)
[   17.217238] Stack: (0xe087de00 to 0xe087e000)
[   17.221628] de00: c2d0a980 00000000 0013d620 c2606c00 00118c30 c066e9b8 00118c30 0013d620
[   17.229881] de20: c2d0a980 00118c30 0013d620 0013d620 00118c30 c066ecd0 00124f80 c19c4800
[   17.238122] de40: 00000019 00000001 00000000 c654da05 c2d0ae00 c27d8880 00124f80 c0929e50
[   17.246336] de60: ded98050 c2c61800 3dfd2400 0013d620 4d7c6d00 c654da05 c1486140 c2c61800
[   17.254550] de80: 00000000 00000007 00000000 c14e703c 000fde80 0013d620 c1486140 c0922e6c
[   17.262767] dea0: dedb1470 c0cbe3e0 c129a370 ffffffff c2c61800 c2c61800 000fde80 0013d620
[   17.270980] dec0: 0000002c c654da05 dedb4305 c2c61800 c9469400 c9469580 c6099080 c9469400
[   17.279199] dee0: c9469580 dedb4305 c1473260 c09275b4 c946943c 00000000 c9469404 c2c61800
[   17.287397] df00: c144b604 c19c4800 dedb4305 c0928574 c946943c c1827c80 dedb10c0 dedb4300
[   17.295609] df20: 00000000 c0143dac c1827c98 c19c4800 dedb10c0 dedb10c0 dedb10dc c1827c80
[   17.303838] df40: dedb10c0 c1827c98 dedb10dc c1303d40 00000008 c19c4800 dedb10c0 c0144508
[   17.312050] df60: c19c4800 c14727b6 c1827c80 c19b7900 c19b7a80 e0825dec c19c4800 c01444ac
[   17.320247] df80: c1827c80 00000000 00000000 c014bd48 c19b7900 c014bc80 00000000 00000000
[   17.328443] dfa0: 00000000 00000000 00000000 c0100148 00000000 00000000 00000000 00000000
[   17.336637] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   17.344831] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   17.353041]  regulator_check_voltage from regulator_set_voltage_unlocked+0xb4/0x124
[   17.360770]  regulator_set_voltage_unlocked from regulator_set_voltage+0x4c/0x8c
[   17.368204]  regulator_set_voltage from mtk_cpufreq_set_target+0xf8/0x394
[   17.375039]  mtk_cpufreq_set_target from __target_index+0x88/0x254
[   17.381254]  __target_index from od_dbs_update+0x150/0x17c
[   17.386771]  od_dbs_update from dbs_work_handler+0x34/0x60
[   17.392291]  dbs_work_handler from process_one_work+0x1c0/0x4a4
[   17.398247]  process_one_work from worker_thread+0x5c/0x50c
[   17.403850]  worker_thread from kthread+0xc8/0xcc
[   17.408582]  kthread from ret_from_fork+0x14/0x2c
[   17.413309] Exception stack(0xe087dfb0 to 0xe087dff8)
[   17.418378] dfa0:                                     00000000 00000000 00000000 00000000
[   17.426584] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   17.434794] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   17.441430] Code: e1500005 a1a05000 aafffff4 eafffff1 (e7f001f2)
[   17.447548] ---[ end trace 0000000000000000 ]---

It's also a known issue here: https://bugzilla.kernel.org/show_bug.cgi?id=216690

Solution

Backport the patch from the 6.3-rc kernel to the LTS kernel version to fix the issue.

I've tested on my BPI R2 and it fix the issue on my side.

jiaweichang-mtk and others added 4 commits June 14, 2023 07:30
In order to prevent passing zero to 'PTR_ERR' in
mtk_cpu_dvfs_info_init(), we fix the return value of of_get_cci() using
error pointer by explicitly casting error number.

Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Fixes: 0daa473 ("cpufreq: mediatek: Link CCI device to CPU")
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
…/clk_put

Any kind of failure in mtk_cpu_dvfs_info_init() will lead to calling
regulator_put() or clk_put() and the KP will occur since the regulator/clk
handlers are used after released in mtk_cpu_dvfs_info_release().

To prevent the usage after regulator_put()/clk_put(), the regulator/clk
handlers are addressed in a way of "Free the Last Thing Style".

Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Fixes: 4b9ceb7 ("cpufreq: mediatek: Enable clocks and regulators")
Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Suggested-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Since the upper boundary of proc/sram voltage of MT8516 is 1300 mV,
which is greater than the value of MT2701 1150 mV, we fix it by adding
the corresponding platform data and specify proc/sram_max_volt to
support MT8516.

Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Fixes: ead858b ("cpufreq: mediatek: Move voltage limits to platform data")
Fixes: 6a17b38 ("cpufreq: mediatek: Refine mtk_cpufreq_voltage_tracking()")
Reported-by: Nick Hainke <vincent@systemli.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
During the addition of SRAM voltage tracking for CCI scaling, this
driver got some voltage limits set for the vtrack algorithm: these
were moved to platform data first, then enforced in a later commit
6a17b38 ("cpufreq: mediatek: Refine mtk_cpufreq_voltage_tracking()")
using these as max values for the regulator_set_voltage() calls.

In this case, the vsram/vproc constraints for MT7622 and MT7623
were supposed to be the same as MT2701 (and a number of other SoCs),
but that turned out to be a mistake because the aforementioned two
SoCs' maximum voltage for both VPROC and VPROC_SRAM is 1.36V.

Fix that by adding new platform data for MT7622/7623 declaring the
right {proc,sram}_max_volt parameter.

Fixes: ead858b ("cpufreq: mediatek: Move voltage limits to platform data")
Fixes: 6a17b38 ("cpufreq: mediatek: Refine mtk_cpufreq_voltage_tracking()")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
@frank-w
Copy link
Owner

frank-w commented Jun 19, 2023

Hi,

Thx for tracing and reporting it

need to update lts branches in next time,maybe these patches already backported in higher dot release...

Have you tried merging latest 6.1.x from stable git?

Will merge the pr if it is still needed.

@Josue-T
Copy link
Author

Josue-T commented Jun 19, 2023

Hello,

I've checked and I as I see it's also backported on the last upstream release 6.1.x.

@frank-w
Copy link
Owner

frank-w commented Jun 19, 2023

i have merged stable 6.1.x into 6.1-main, so you can test with this base. if it is still not working you can rebase the PR to the new base

@Josue-T
Copy link
Author

Josue-T commented Jun 19, 2023

I've tested and it works so it's perfect. Thanks

@Josue-T Josue-T closed this Jun 19, 2023
@frank-w
Copy link
Owner

frank-w commented Jun 21, 2023

Thx for testing

frank-w pushed a commit that referenced this pull request Nov 14, 2025
[ Upstream commit 094ee60 ]

Following operations can trigger a warning[1]:

    ip netns add ns1
    ip netns exec ns1 ip link add bond0 type bond mode balance-rr
    ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp
    ip netns exec ns1 ip link set bond0 type bond mode broadcast
    ip netns del ns1

When delete the namespace, dev_xdp_uninstall() is called to remove xdp
program on bond dev, and bond_xdp_set() will check the bond mode. If bond
mode is changed after attaching xdp program, the warning may occur.

Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode
with xdp program attached is not good. Add check for xdp program when set
bond mode.

    [1]
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930
    Modules linked in:
    CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
    Workqueue: netns cleanup_net
    RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930
    Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ...
    RSP: 0018:ffffc90000063d80 EFLAGS: 00000282
    RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff
    RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48
    RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb
    R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8
    R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000
    FS:  0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0
    Call Trace:
     <TASK>
     ? __warn+0x83/0x130
     ? unregister_netdevice_many_notify+0x8d9/0x930
     ? report_bug+0x18e/0x1a0
     ? handle_bug+0x54/0x90
     ? exc_invalid_op+0x18/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? unregister_netdevice_many_notify+0x8d9/0x930
     ? bond_net_exit_batch_rtnl+0x5c/0x90
     cleanup_net+0x237/0x3d0
     process_one_work+0x163/0x390
     worker_thread+0x293/0x3b0
     ? __pfx_worker_thread+0x10/0x10
     kthread+0xec/0x1e0
     ? __pfx_kthread+0x10/0x10
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x2f/0x50
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    ---[ end trace 0000000000000000 ]---

Fixes: 9e2ee5c ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Acked-by: Jussi Maki <joamaki@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
frank-w pushed a commit that referenced this pull request Dec 7, 2025
[ Upstream commit dfe28c4 ]

The validation of the set(nsh(...)) action is completely wrong.
It runs through the nsh_key_put_from_nlattr() function that is the
same function that validates NSH keys for the flow match and the
push_nsh() action.  However, the set(nsh(...)) has a very different
memory layout.  Nested attributes in there are doubled in size in
case of the masked set().  That makes proper validation impossible.

There is also confusion in the code between the 'masked' flag, that
says that the nested attributes are doubled in size containing both
the value and the mask, and the 'is_mask' that says that the value
we're parsing is the mask.  This is causing kernel crash on trying to
write into mask part of the match with SW_FLOW_KEY_PUT() during
validation, while validate_nsh() doesn't allocate any memory for it:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ #107 PREEMPT(voluntary)
  RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch]
  Call Trace:
   <TASK>
   validate_nsh+0x60/0x90 [openvswitch]
   validate_set.constprop.0+0x270/0x3c0 [openvswitch]
   __ovs_nla_copy_actions+0x477/0x860 [openvswitch]
   ovs_nla_copy_actions+0x8d/0x100 [openvswitch]
   ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch]
   genl_family_rcv_msg_doit+0xdb/0x130
   genl_family_rcv_msg+0x14b/0x220
   genl_rcv_msg+0x47/0xa0
   netlink_rcv_skb+0x53/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x280/0x3b0
   netlink_sendmsg+0x1f7/0x430
   ____sys_sendmsg+0x36b/0x3a0
   ___sys_sendmsg+0x87/0xd0
   __sys_sendmsg+0x6d/0xd0
   do_syscall_64+0x7b/0x2c0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

The third issue with this process is that while trying to convert
the non-masked set into masked one, validate_set() copies and doubles
the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested
attributes.  It should be copying each nested attribute and doubling
them in size independently.  And the process must be properly reversed
during the conversion back from masked to a non-masked variant during
the flow dump.

In the end, the only two outcomes of trying to use this action are
either validation failure or a kernel crash.  And if somehow someone
manages to install a flow with such an action, it will most definitely
not do what it is supposed to, since all the keys and the masks are
mixed up.

Fixing all the issues is a complex task as it requires re-writing
most of the validation code.

Given that and the fact that this functionality never worked since
introduction, let's just remove it altogether.  It's better to
re-introduce it later with a proper implementation instead of trying
to fix it in stable releases.

Fixes: b2d0f5d ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w pushed a commit that referenced this pull request Dec 7, 2025
[ Upstream commit dfe28c4 ]

The validation of the set(nsh(...)) action is completely wrong.
It runs through the nsh_key_put_from_nlattr() function that is the
same function that validates NSH keys for the flow match and the
push_nsh() action.  However, the set(nsh(...)) has a very different
memory layout.  Nested attributes in there are doubled in size in
case of the masked set().  That makes proper validation impossible.

There is also confusion in the code between the 'masked' flag, that
says that the nested attributes are doubled in size containing both
the value and the mask, and the 'is_mask' that says that the value
we're parsing is the mask.  This is causing kernel crash on trying to
write into mask part of the match with SW_FLOW_KEY_PUT() during
validation, while validate_nsh() doesn't allocate any memory for it:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ #107 PREEMPT(voluntary)
  RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch]
  Call Trace:
   <TASK>
   validate_nsh+0x60/0x90 [openvswitch]
   validate_set.constprop.0+0x270/0x3c0 [openvswitch]
   __ovs_nla_copy_actions+0x477/0x860 [openvswitch]
   ovs_nla_copy_actions+0x8d/0x100 [openvswitch]
   ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch]
   genl_family_rcv_msg_doit+0xdb/0x130
   genl_family_rcv_msg+0x14b/0x220
   genl_rcv_msg+0x47/0xa0
   netlink_rcv_skb+0x53/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x280/0x3b0
   netlink_sendmsg+0x1f7/0x430
   ____sys_sendmsg+0x36b/0x3a0
   ___sys_sendmsg+0x87/0xd0
   __sys_sendmsg+0x6d/0xd0
   do_syscall_64+0x7b/0x2c0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

The third issue with this process is that while trying to convert
the non-masked set into masked one, validate_set() copies and doubles
the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested
attributes.  It should be copying each nested attribute and doubling
them in size independently.  And the process must be properly reversed
during the conversion back from masked to a non-masked variant during
the flow dump.

In the end, the only two outcomes of trying to use this action are
either validation failure or a kernel crash.  And if somehow someone
manages to install a flow with such an action, it will most definitely
not do what it is supposed to, since all the keys and the masks are
mixed up.

Fixing all the issues is a complex task as it requires re-writing
most of the validation code.

Given that and the fact that this functionality never worked since
introduction, let's just remove it altogether.  It's better to
re-introduce it later with a proper implementation instead of trying
to fix it in stable releases.

Fixes: b2d0f5d ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w pushed a commit that referenced this pull request Dec 7, 2025
[ Upstream commit 094ee60 ]

Following operations can trigger a warning[1]:

    ip netns add ns1
    ip netns exec ns1 ip link add bond0 type bond mode balance-rr
    ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp
    ip netns exec ns1 ip link set bond0 type bond mode broadcast
    ip netns del ns1

When delete the namespace, dev_xdp_uninstall() is called to remove xdp
program on bond dev, and bond_xdp_set() will check the bond mode. If bond
mode is changed after attaching xdp program, the warning may occur.

Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode
with xdp program attached is not good. Add check for xdp program when set
bond mode.

    [1]
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930
    Modules linked in:
    CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
    Workqueue: netns cleanup_net
    RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930
    Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ...
    RSP: 0018:ffffc90000063d80 EFLAGS: 00000282
    RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff
    RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48
    RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb
    R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8
    R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000
    FS:  0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0
    Call Trace:
     <TASK>
     ? __warn+0x83/0x130
     ? unregister_netdevice_many_notify+0x8d9/0x930
     ? report_bug+0x18e/0x1a0
     ? handle_bug+0x54/0x90
     ? exc_invalid_op+0x18/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? unregister_netdevice_many_notify+0x8d9/0x930
     ? bond_net_exit_batch_rtnl+0x5c/0x90
     cleanup_net+0x237/0x3d0
     process_one_work+0x163/0x390
     worker_thread+0x293/0x3b0
     ? __pfx_worker_thread+0x10/0x10
     kthread+0xec/0x1e0
     ? __pfx_kthread+0x10/0x10
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x2f/0x50
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    ---[ end trace 0000000000000000 ]---

Fixes: 9e2ee5c ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Acked-by: Jussi Maki <joamaki@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
frank-w pushed a commit that referenced this pull request Dec 7, 2025
[ Upstream commit dfe28c4 ]

The validation of the set(nsh(...)) action is completely wrong.
It runs through the nsh_key_put_from_nlattr() function that is the
same function that validates NSH keys for the flow match and the
push_nsh() action.  However, the set(nsh(...)) has a very different
memory layout.  Nested attributes in there are doubled in size in
case of the masked set().  That makes proper validation impossible.

There is also confusion in the code between the 'masked' flag, that
says that the nested attributes are doubled in size containing both
the value and the mask, and the 'is_mask' that says that the value
we're parsing is the mask.  This is causing kernel crash on trying to
write into mask part of the match with SW_FLOW_KEY_PUT() during
validation, while validate_nsh() doesn't allocate any memory for it:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ #107 PREEMPT(voluntary)
  RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch]
  Call Trace:
   <TASK>
   validate_nsh+0x60/0x90 [openvswitch]
   validate_set.constprop.0+0x270/0x3c0 [openvswitch]
   __ovs_nla_copy_actions+0x477/0x860 [openvswitch]
   ovs_nla_copy_actions+0x8d/0x100 [openvswitch]
   ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch]
   genl_family_rcv_msg_doit+0xdb/0x130
   genl_family_rcv_msg+0x14b/0x220
   genl_rcv_msg+0x47/0xa0
   netlink_rcv_skb+0x53/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x280/0x3b0
   netlink_sendmsg+0x1f7/0x430
   ____sys_sendmsg+0x36b/0x3a0
   ___sys_sendmsg+0x87/0xd0
   __sys_sendmsg+0x6d/0xd0
   do_syscall_64+0x7b/0x2c0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

The third issue with this process is that while trying to convert
the non-masked set into masked one, validate_set() copies and doubles
the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested
attributes.  It should be copying each nested attribute and doubling
them in size independently.  And the process must be properly reversed
during the conversion back from masked to a non-masked variant during
the flow dump.

In the end, the only two outcomes of trying to use this action are
either validation failure or a kernel crash.  And if somehow someone
manages to install a flow with such an action, it will most definitely
not do what it is supposed to, since all the keys and the masks are
mixed up.

Fixing all the issues is a complex task as it requires re-writing
most of the validation code.

Given that and the fact that this functionality never worked since
introduction, let's just remove it altogether.  It's better to
re-introduce it later with a proper implementation instead of trying
to fix it in stable releases.

Fixes: b2d0f5d ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants