Skip to content

Conversation

@ohoang7bh
Copy link

No description provided.

@chewitt
Copy link
Owner

chewitt commented Jan 29, 2023

Thanks, but:

  • Filename contains personal names (not relevant for the kernel)
  • The content appears to be from a dtc dump, i.e. lots of values from a compiled dts
  • The device-tree makes no attempt to use existing dtsi files, e.g. meson-g12b-w400.dtsi
  • The device-tree has not been submitted upstream to the kernel
  • The assignment of Copyright to Amlogic is supicious; esp. if the source dts is based on upstream work
  • The PR is created against the upstream/master branch; but my changes are all in amlogic-6.x.y topic branches

I suggest you identify differences between the existing AM6 device and AM6B+ device, then create a new/clean device-tree file based on the existing w400 dtsi (the approach used in the existing AM6 dts) to avoid duplication.

NB: I will be happy to backport a device-tree that has been submitted to the kernel. I have no interest in adding devices to my tree that will not be upstreamed (as I do not have the device, and cannot test it).

@chewitt chewitt closed this Jan 29, 2023
@chewitt chewitt mentioned this pull request Jan 29, 2023
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit 3c46372 ]

This lockdep splat says it better than I could:

================================
WARNING: inconsistent lock state
6.2.0-rc2-07010-ga9b9500ffaac-dirty torvalds#967 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0
{IN-SOFTIRQ-W} state was registered at:
  _raw_spin_lock+0x5c/0xc0
  sch_direct_xmit+0x148/0x37c
  __dev_queue_xmit+0x528/0x111c
  ip6_finish_output2+0x5ec/0xb7c
  ip6_finish_output+0x240/0x3f0
  ip6_output+0x78/0x360
  ndisc_send_skb+0x33c/0x85c
  ndisc_send_rs+0x54/0x12c
  addrconf_rs_timer+0x154/0x260
  call_timer_fn+0xb8/0x3a0
  __run_timers.part.0+0x214/0x26c
  run_timer_softirq+0x3c/0x74
  __do_softirq+0x14c/0x5d8
  ____do_softirq+0x10/0x20
  call_on_irq_stack+0x2c/0x5c
  do_softirq_own_stack+0x1c/0x30
  __irq_exit_rcu+0x168/0x1a0
  irq_exit_rcu+0x10/0x40
  el1_interrupt+0x38/0x64
irq event stamp: 7825
hardirqs last  enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130
hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8
softirqs last  enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8
softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(_xmit_ETHER#2);
  <Interrupt>
    lock(_xmit_ETHER#2);

 *** DEADLOCK ***

3 locks held by kworker/1:3/179:
 #0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
 #1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
 #2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34

Workqueue: events enetc_tx_onestep_tstamp
Call trace:
 print_usage_bug.part.0+0x208/0x22c
 mark_lock+0x7f0/0x8b0
 __lock_acquire+0x7c4/0x1ce0
 lock_acquire.part.0+0xe0/0x220
 lock_acquire+0x68/0x84
 _raw_spin_lock+0x5c/0xc0
 netif_freeze_queues+0x5c/0xc0
 netif_tx_lock+0x24/0x34
 enetc_tx_onestep_tstamp+0x20/0x100
 process_one_work+0x28c/0x6c0
 worker_thread+0x74/0x450
 kthread+0x118/0x11c

but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in
process context, therefore with softirqs enabled (i.o.w., it can be
interrupted by a softirq). If we hold the netif_tx_lock() when there is
an interrupt, and the NET_TX softirq then gets scheduled, this will take
the netif_tx_lock() a second time and deadlock the kernel.

To solve this, use netif_tx_lock_bh(), which blocks softirqs from
running.

Fixes: 7294380 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230112105440.1786799-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit 241f519 ]

This attempts to avoid circular locking dependency between sock_lock
and hdev_lock:

WARNING: possible circular locking dependency detected
6.0.0-rc7-03728-g18dd8ab0a783 #3 Not tainted
------------------------------------------------------
kworker/u3:2/53 is trying to acquire lock:
ffff888000254130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
iso_conn_del+0xbd/0x1d0
but task is already holding lock:
ffffffff9f39a080 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_le_cis_estabilished_evt+0x1b5/0x500
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x10e/0xfe0
       hci_le_remote_feat_complete_evt+0x17f/0x320
       hci_event_packet+0x39c/0x7d0
       hci_rx_work+0x2bf/0x950
       process_one_work+0x569/0x980
       worker_thread+0x2a3/0x6f0
       kthread+0x153/0x180
       ret_from_fork+0x22/0x30
-> #1 (&hdev->lock){+.+.}-{3:3}:
       __mutex_lock+0x10e/0xfe0
       iso_connect_cis+0x6f/0x5a0
       iso_sock_connect+0x1af/0x710
       __sys_connect+0x17e/0x1b0
       __x64_sys_connect+0x37/0x50
       do_syscall_64+0x43/0x90
       entry_SYSCALL_64_after_hwframe+0x62/0xcc
-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
       __lock_acquire+0x1b51/0x33d0
       lock_acquire+0x16f/0x3b0
       lock_sock_nested+0x32/0x80
       iso_conn_del+0xbd/0x1d0
       iso_connect_cfm+0x226/0x680
       hci_le_cis_estabilished_evt+0x1ed/0x500
       hci_event_packet+0x39c/0x7d0
       hci_rx_work+0x2bf/0x950
       process_one_work+0x569/0x980
       worker_thread+0x2a3/0x6f0
       kthread+0x153/0x180
       ret_from_fork+0x22/0x30
other info that might help us debug this:
Chain exists of:
  sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> &hdev->lock --> hci_cb_list_lock
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(hci_cb_list_lock);
                               lock(&hdev->lock);
                               lock(hci_cb_list_lock);
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
 *** DEADLOCK ***
4 locks held by kworker/u3:2/53:
 #0: ffff8880021d9130 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
 process_one_work+0x4ad/0x980
 #1: ffff888002387de0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
 at: process_one_work+0x4ad/0x980
 #2: ffff888001ac0070 (&hdev->lock){+.+.}-{3:3}, at:
 hci_le_cis_estabilished_evt+0xc3/0x500
 #3: ffffffff9f39a080 (hci_cb_list_lock){+.+.}-{3:3}, at:
 hci_le_cis_estabilished_evt+0x1b5/0x500

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 6a5ad25 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit 6a5ad25 ]

This attempts to fix the following trace:

kworker/u3:1/184 is trying to acquire lock:
ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
iso_connect_cfm+0x2de/0x690

but task is already holding lock:
ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at:
iso_connect_cfm+0x265/0x690

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&conn->lock){+.+.}-{2:2}:
       lock_acquire+0x176/0x3d0
       _raw_spin_lock+0x2a/0x40
       __iso_sock_close+0x1dd/0x4f0
       iso_sock_release+0xa0/0x1b0
       sock_close+0x5e/0x120
       __fput+0x102/0x410
       task_work_run+0xf1/0x160
       exit_to_user_mode_prepare+0x170/0x180
       syscall_exit_to_user_mode+0x19/0x50
       do_syscall_64+0x4e/0x90
       entry_SYSCALL_64_after_hwframe+0x62/0xcc

-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
       check_prev_add+0xfc/0x1190
       __lock_acquire+0x1e27/0x2750
       lock_acquire+0x176/0x3d0
       lock_sock_nested+0x32/0x80
       iso_connect_cfm+0x2de/0x690
       hci_cc_le_setup_iso_path+0x195/0x340
       hci_cmd_complete_evt+0x1ae/0x500
       hci_event_packet+0x38e/0x7c0
       hci_rx_work+0x34c/0x980
       process_one_work+0x5a5/0x9a0
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x22/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&conn->lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
                               lock(&conn->lock);
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);

 *** DEADLOCK ***

Fixes: ccf74f2 ("Bluetooth: Add BTPROTO_ISO socket type")
Fixes: f764a6c ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit e9d50f7 ]

This fixes the following trace caused by attempting to lock
cmd_sync_work_lock while holding the rcu_read_lock:

kworker/u3:2/212 is trying to lock:
ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at:
hci_cmd_sync_queue+0xad/0x140
other info that might help us debug this:
context-{4:4}
4 locks held by kworker/u3:2/212:
 #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
 process_one_work+0x4dc/0x9a0
 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
 at: process_one_work+0x4dc/0x9a0
 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at:
 hci_cc_le_set_cig_params+0x64/0x4f0
 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at:
 hci_cc_le_set_cig_params+0x2f9/0x4f0

Fixes: 26afbd8 ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit 5aa5610 ]

The cited commit changed class of tc_ht internal mutex in order to avoid
false lock dependency with fs_core node and flow_table hash table
structures. However, hash table implementation internally also includes a
workqueue task with its own lockdep map which causes similar bogus lockdep
splat[0]. Fix it by also adding dedicated class for hash table workqueue
work structure of tc_ht.

[0]:

[ 1139.672465] ======================================================
[ 1139.673552] WARNING: possible circular locking dependency detected
[ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted
[ 1139.675734] ------------------------------------------------------
[ 1139.676801] modprobe/5998 is trying to acquire lock:
[ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.679662]
               but task is already holding lock:
[ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.682223]
               which lock already depends on the new lock.

[ 1139.683640]
               the existing dependency chain (in reverse order) is:
[ 1139.684887]
               -> #2 (&tc_ht_lock_key){+.+.}-{3:3}:
[ 1139.685975]        __mutex_lock+0x12c/0x14b0
[ 1139.686659]        rht_deferred_worker+0x35/0x1540
[ 1139.687405]        process_one_work+0x7c2/0x1310
[ 1139.688134]        worker_thread+0x59d/0xec0
[ 1139.688820]        kthread+0x28f/0x330
[ 1139.689444]        ret_from_fork+0x1f/0x30
[ 1139.690106]
               -> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}:
[ 1139.691250]        __flush_work+0xe8/0x900
[ 1139.691915]        __cancel_work_timer+0x2ca/0x3f0
[ 1139.692655]        rhashtable_free_and_destroy+0x22/0x6f0
[ 1139.693472]        del_sw_flow_table+0x22/0xb0 [mlx5_core]
[ 1139.694592]        tree_put_node+0x24c/0x450 [mlx5_core]
[ 1139.695686]        tree_remove_node+0x6e/0x100 [mlx5_core]
[ 1139.696803]        mlx5_destroy_flow_table+0x187/0x690 [mlx5_core]
[ 1139.698017]        mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core]
[ 1139.699217]        mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core]
[ 1139.700397]        mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core]
[ 1139.701571]        mlx5e_suspend+0xdb/0x140 [mlx5_core]
[ 1139.702665]        mlx5e_remove+0x89/0x190 [mlx5_core]
[ 1139.703756]        auxiliary_bus_remove+0x52/0x70
[ 1139.704492]        device_release_driver_internal+0x3c1/0x600
[ 1139.705360]        bus_remove_device+0x2a5/0x560
[ 1139.706080]        device_del+0x492/0xb80
[ 1139.706724]        mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core]
[ 1139.707961]        mlx5_unregister_device+0x7a/0xa0 [mlx5_core]
[ 1139.709138]        mlx5_uninit_one+0x5f/0x160 [mlx5_core]
[ 1139.710252]        remove_one+0xd1/0x160 [mlx5_core]
[ 1139.711297]        pci_device_remove+0x96/0x1c0
[ 1139.722721]        device_release_driver_internal+0x3c1/0x600
[ 1139.723590]        unbind_store+0x1b1/0x200
[ 1139.724259]        kernfs_fop_write_iter+0x348/0x520
[ 1139.725019]        vfs_write+0x7b2/0xbf0
[ 1139.725658]        ksys_write+0xf3/0x1d0
[ 1139.726292]        do_syscall_64+0x3d/0x90
[ 1139.726942]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.727769]
               -> #0 (&node->lock){++++}-{3:3}:
[ 1139.728698]        __lock_acquire+0x2cf5/0x62f0
[ 1139.729415]        lock_acquire+0x1c1/0x540
[ 1139.730076]        down_write+0x8e/0x1f0
[ 1139.730709]        down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.731841]        mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.732982]        __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.734207]        mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.735491]        mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.736716]        mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.738007]        mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.739213]        mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.740377]        _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.741534]        rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.742351]        mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.743512]        mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.744683]        mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.745860]        mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.747098]        mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.748372]        mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.749590]        __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.750813]        mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.752147]        mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.753293]        auxiliary_bus_remove+0x52/0x70
[ 1139.754028]        device_release_driver_internal+0x3c1/0x600
[ 1139.754885]        driver_detach+0xc1/0x180
[ 1139.755553]        bus_remove_driver+0xef/0x2e0
[ 1139.756260]        auxiliary_driver_unregister+0x16/0x50
[ 1139.757059]        mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.758207]        mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.759295]        mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.760384]        __x64_sys_delete_module+0x2b5/0x450
[ 1139.761166]        do_syscall_64+0x3d/0x90
[ 1139.761827]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.762663]
               other info that might help us debug this:

[ 1139.763925] Chain exists of:
                 &node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key

[ 1139.765743]  Possible unsafe locking scenario:

[ 1139.766688]        CPU0                    CPU1
[ 1139.767399]        ----                    ----
[ 1139.768111]   lock(&tc_ht_lock_key);
[ 1139.768704]                                lock((work_completion)(&ht->run_work));
[ 1139.769869]                                lock(&tc_ht_lock_key);
[ 1139.770770]   lock(&node->lock);
[ 1139.771326]
                *** DEADLOCK ***

[ 1139.772345] 2 locks held by modprobe/5998:
[ 1139.772994]  #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
[ 1139.774399]  #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.775822]
               stack backtrace:
[ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1
[ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1139.779529] Call Trace:
[ 1139.779992]  <TASK>
[ 1139.780409]  dump_stack_lvl+0x57/0x7d
[ 1139.781015]  check_noncircular+0x278/0x300
[ 1139.781687]  ? print_circular_bug+0x460/0x460
[ 1139.782381]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1139.783121]  ? lock_release+0x487/0x7c0
[ 1139.783759]  ? orc_find.part.0+0x1f1/0x330
[ 1139.784423]  ? mark_lock.part.0+0xef/0x2fc0
[ 1139.785091]  __lock_acquire+0x2cf5/0x62f0
[ 1139.785754]  ? register_lock_class+0x18e0/0x18e0
[ 1139.786483]  lock_acquire+0x1c1/0x540
[ 1139.787093]  ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.788195]  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
[ 1139.788978]  ? register_lock_class+0x18e0/0x18e0
[ 1139.789715]  down_write+0x8e/0x1f0
[ 1139.790292]  ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.791380]  ? down_write_killable+0x220/0x220
[ 1139.792080]  ? find_held_lock+0x2d/0x110
[ 1139.792713]  down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.793795]  mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.794879]  __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.796032]  ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core]
[ 1139.797227]  ? xa_load+0x11a/0x200
[ 1139.797800]  ? __xa_clear_mark+0xf0/0xf0
[ 1139.798438]  mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.799660]  mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.800821]  mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.802049]  ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core]
[ 1139.803260]  mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.804398]  ? __cancel_work_timer+0x1c2/0x3f0
[ 1139.805099]  ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core]
[ 1139.806387]  mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.807481]  _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.808564]  rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.809336]  ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.809336]  ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.810455]  mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.811552]  mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.812655]  mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.813768]  mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.814952]  mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.816166]  mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.817336]  __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.818507]  mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.819788]  ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core]
[ 1139.821051]  ? kernfs_find_ns+0x137/0x310
[ 1139.821705]  mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.822778]  auxiliary_bus_remove+0x52/0x70
[ 1139.823449]  device_release_driver_internal+0x3c1/0x600
[ 1139.824240]  driver_detach+0xc1/0x180
[ 1139.824842]  bus_remove_driver+0xef/0x2e0
[ 1139.825504]  auxiliary_driver_unregister+0x16/0x50
[ 1139.826245]  mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.827322]  mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.828345]  mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.829382]  __x64_sys_delete_module+0x2b5/0x450
[ 1139.830119]  ? module_flags+0x300/0x300
[ 1139.830750]  ? task_work_func_match+0x50/0x50
[ 1139.831440]  ? task_work_cancel+0x20/0x20
[ 1139.832088]  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
[ 1139.832873]  ? syscall_enter_from_user_mode+0x1d/0x50
[ 1139.833661]  ? trace_hardirqs_on+0x2d/0x100
[ 1139.834328]  do_syscall_64+0x3d/0x90
[ 1139.834922]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.835700] RIP: 0033:0x7f153e71288b
[ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48
[ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b
[ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288
[ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000
[ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288
[ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8

Fixes: 9ba3333 ("net/mlx5e: Avoid false lock depenency warning on tc_ht")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit f3dc1b3 ]

The first time dma_chan_get() is called for a channel the channel
client_count is incorrectly incremented twice for public channels,
first in balance_ref_count(), and again prior to returning. This
results in an incorrect client count which will lead to the
channel resources not being freed when they should be. A simple
 test of repeated module load and unload of async_tx on a Dell
 Power Edge R7425 also shows this resulting in a kref underflow
 warning.

[  124.329662] async_tx: api initialized (async)
[  129.000627] async_tx: api initialized (async)
[  130.047839] ------------[ cut here ]------------
[  130.052472] refcount_t: underflow; use-after-free.
[  130.057279] WARNING: CPU: 3 PID: 19364 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  130.065811] Modules linked in: async_tx(-) rfkill intel_rapl_msr
intel_rapl_common amd64_edac edac_mce_amd ipmi_ssif kvm_amd dcdbas kvm
mgag200 drm_shmem_helper acpi_ipmi irqbypass drm_kms_helper ipmi_si
syscopyarea sysfillrect rapl pcspkr ipmi_devintf sysimgblt fb_sys_fops
k10temp i2c_piix4 ipmi_msghandler acpi_power_meter acpi_cpufreq vfat
fat drm fuse xfs libcrc32c sd_mod t10_pi sg ahci crct10dif_pclmul
libahci crc32_pclmul crc32c_intel ghash_clmulni_intel igb megaraid_sas
i40e libata i2c_algo_bit ccp sp5100_tco dca dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: async_tx]
[  130.117361] CPU: 3 PID: 19364 Comm: modprobe Kdump: loaded Not
tainted 5.14.0-185.el9.x86_64 #1
[  130.126091] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS
1.18.0 01/17/2022
[  130.133806] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  130.139041] Code: 01 01 e8 6d bd 55 00 0f 0b e9 72 9d 8a 00 80 3d
26 18 9c 01 00 75 85 48 c7 c7 f8 a3 03 9d c6 05 16 18 9c 01 01 e8 4a
bd 55 00 <0f> 0b e9 4f 9d 8a 00 80 3d 01 18 9c 01 00 0f 85 5e ff ff ff
48 c7
[  130.157807] RSP: 0018:ffffbf98898afe68 EFLAGS: 00010286
[  130.163036] RAX: 0000000000000000 RBX: ffff9da06028e598 RCX: 0000000000000000
[  130.170172] RDX: ffff9daf9de26480 RSI: ffff9daf9de198a0 RDI: ffff9daf9de198a0
[  130.177316] RBP: ffff9da7cddf3970 R08: 0000000000000000 R09: 00000000ffff7fff
[  130.184459] R10: ffffbf98898afd00 R11: ffffffff9d9e8c28 R12: ffff9da7cddf1970
[  130.191596] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  130.198739] FS:  00007f646435c740(0000) GS:ffff9daf9de00000(0000)
knlGS:0000000000000000
[  130.206832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  130.212586] CR2: 00007f6463b214f0 CR3: 00000008ab98c000 CR4: 00000000003506e0
[  130.219729] Call Trace:
[  130.222192]  <TASK>
[  130.224305]  dma_chan_put+0x10d/0x110
[  130.227988]  dmaengine_put+0x7a/0xa0
[  130.231575]  __do_sys_delete_module.constprop.0+0x178/0x280
[  130.237157]  ? syscall_trace_enter.constprop.0+0x145/0x1d0
[  130.242652]  do_syscall_64+0x5c/0x90
[  130.246240]  ? exc_page_fault+0x62/0x150
[  130.250178]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  130.255243] RIP: 0033:0x7f6463a3f5ab
[  130.258830] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89
01 48
[  130.277591] RSP: 002b:00007fff22f972c8 EFLAGS: 00000206 ORIG_RAX:
00000000000000b0
[  130.285164] RAX: ffffffffffffffda RBX: 000055b6786edd40 RCX: 00007f6463a3f5ab
[  130.292303] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6786edda8
[  130.299443] RBP: 000055b6786edd40 R08: 0000000000000000 R09: 0000000000000000
[  130.306584] R10: 00007f6463b9eac0 R11: 0000000000000206 R12: 000055b6786edda8
[  130.313731] R13: 0000000000000000 R14: 000055b6786edda8 R15: 00007fff22f995f8
[  130.320875]  </TASK>
[  130.323081] ---[ end trace eff7156d56b5cf25 ]---

cat /sys/class/dma/dma0chan*/in_use would get the wrong result.
2
2
2

Fixes: d2f4f99 ("dmaengine: Rework dma_chan_get")
Signed-off-by: Koba Ko <koba.ko@canonical.com>
Reviewed-by: Jie Hai <haijie1@huawei.com>
Test-by: Jie Hai <haijie1@huawei.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/r/20221201030050.978595-1-koba.ko@canonical.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
commit ba81043 upstream.

There is a lock inversion and rwsem read-lock recursion in the devfreq
target callback which can lead to deadlocks.

Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
read lock when toggling the write booster, which involves taking the
dev_cmd mutex before taking another clk_scaling_lock read lock.

This can lead to a deadlock if another thread:

  1) tries to acquire the dev_cmd and clk_scaling locks in the correct
     order, or

  2) takes a clk_scaling write lock before the attempt to take the
     clk_scaling read lock a second time.

Fix this by dropping the clk_scaling_lock before toggling the write booster
as was done before commit 0e9d4ca ("scsi: ufs: Protect some contexts
from unexpected clock scaling").

While the devfreq callbacks are already serialised, add a second
serialising mutex to handle the unlikely case where a callback triggered
through the devfreq sysfs interface is racing with a request to disable
clock scaling through the UFS controller 'clkscale_enable' sysfs
attribute. This could otherwise lead to the write booster being left
disabled after having disabled clock scaling.

Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any
pending write booster update has completed on return.

Note that this currently only affects Qualcomm platforms since commit
87bd050 ("scsi: ufs: core: Allow host driver to disable wb toggling
during clock scaling").

The lock inversion (i.e. 1 above) was reported by lockdep as:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.1.0-next-20221216 torvalds#211 Not tainted
 ------------------------------------------------------
 kworker/u16:2/71 is trying to acquire lock:
 ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0

 but task is already holding lock:
 ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380

 which lock already depends on the new lock.
[  +0.011606]
 the existing dependency chain (in reverse order) is:

 -> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
        lock_acquire+0x68/0x90
        down_read+0x58/0x80
        ufshcd_exec_dev_cmd+0x70/0x2c0
        ufshcd_verify_dev_init+0x68/0x170
        ufshcd_probe_hba+0x398/0x1180
        ufshcd_async_scan+0x30/0x320
        async_run_entry_fn+0x34/0x150
        process_one_work+0x288/0x6c0
        worker_thread+0x74/0x450
        kthread+0x118/0x120
        ret_from_fork+0x10/0x20

 -> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
        __lock_acquire+0x12a0/0x2240
        lock_acquire.part.0+0xcc/0x220
        lock_acquire+0x68/0x90
        __mutex_lock+0x98/0x430
        mutex_lock_nested+0x2c/0x40
        ufshcd_query_flag+0x50/0x1c0
        ufshcd_query_flag_retry+0x64/0x100
        ufshcd_wb_toggle+0x5c/0x120
        ufshcd_devfreq_scale+0x2c4/0x380
        ufshcd_devfreq_target+0xf4/0x230
        devfreq_set_target+0x84/0x2f0
        devfreq_update_target+0xc4/0xf0
        devfreq_monitor+0x38/0x1f0
        process_one_work+0x288/0x6c0
        worker_thread+0x74/0x450
        kthread+0x118/0x120
        ret_from_fork+0x10/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0                    CPU1
        ----                    ----
   lock(&hba->clk_scaling_lock);
                                lock(&hba->dev_cmd.lock);
                                lock(&hba->clk_scaling_lock);
   lock(&hba->dev_cmd.lock);

  *** DEADLOCK ***

Fixes: 0e9d4ca ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Cc: stable@vger.kernel.org      # 5.12
Cc: Can Guo <quic_cang@quicinc.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit e38553b ]

The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).

Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
    echo "TINFO:ENET $curface up and down test $i times"
    org_macaddr=$(cat /sys/class/net/eth0/address)
    ifconfig eth0 down
    ifconfig eth0  hw ether $org_macaddr up
    i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
    echo "TEST PASS"
else
    echo "TEST FAIL"
fi

Bug detail logs:
TINFO:ENET  up and down test 391 times
[  850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[  853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[  870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
[  931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
TINFO:ENET  up and down test 392 times
[  991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
[ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
[ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
[ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
[ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
[ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
[ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
TINFO:ENET  up and down test 393 times
[ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
[ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
[ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
[ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
[ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
[ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
[ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
[ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
[ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
[ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
[ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
TINFO:ENET  up and down test 394 times
[ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
[ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
[ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
[ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G         C         6.1.1+g56321e101aca #1
[ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 1537.115561] Call trace:
[ 1537.118005]  dump_backtrace.part.0+0xe0/0xf0
[ 1537.122289]  show_stack+0x18/0x40
[ 1537.125608]  dump_stack_lvl+0x64/0x80
[ 1537.129276]  dump_stack+0x18/0x34
[ 1537.132592]  dump_header+0x44/0x208
[ 1537.136083]  oom_kill_process+0x2b4/0x2c0
[ 1537.140097]  out_of_memory+0xe4/0x594
[ 1537.143766]  __alloc_pages+0xb68/0xd00
[ 1537.147521]  alloc_pages+0xac/0x160
[ 1537.151013]  __get_free_pages+0x14/0x40
[ 1537.154851]  pgd_alloc+0x1c/0x30
[ 1537.158082]  mm_init+0xf8/0x1d0
[ 1537.161228]  mm_alloc+0x48/0x60
[ 1537.164368]  alloc_bprm+0x7c/0x240
[ 1537.167777]  do_execveat_common.isra.0+0x70/0x240
[ 1537.172486]  __arm64_sys_execve+0x40/0x54
[ 1537.176502]  invoke_syscall+0x48/0x114
[ 1537.180255]  el0_svc_common.constprop.0+0xcc/0xec
[ 1537.184964]  do_el0_svc+0x2c/0xd0
[ 1537.188280]  el0_svc+0x2c/0x84
[ 1537.191340]  el0t_64_sync_handler+0xf4/0x120
[ 1537.195613]  el0t_64_sync+0x18c/0x190
[ 1537.199334] Mem-Info:
[ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
[ 1537.201620]  active_file:54 inactive_file:112 isolated_file:0
[ 1537.201620]  unevictable:0 dirty:0 writeback:0
[ 1537.201620]  slab_reclaimable:2620 slab_unreclaimable:7076
[ 1537.201620]  mapped:1489 shmem:2473 pagetables:466
[ 1537.201620]  sec_pagetables:0 bounce:0
[ 1537.201620]  kernel_misc_reclaimable:0
[ 1537.201620]  free:136672 free_pcp:96 free_cma:129241
[ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
[ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
[ 1537.300219] lowmem_reserve[]: 0 0 0 0
[ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
[ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1537.358087] 2939 total pagecache pages
[ 1537.361876] 0 pages in swap cache
[ 1537.365229] Free swap  = 0kB
[ 1537.368147] Total swap = 0kB
[ 1537.371065] 516096 pages RAM
[ 1537.373959] 0 pages HighMem/MovableOnly
[ 1537.377834] 17302 pages reserved
[ 1537.381103] 163840 pages cma reserved
[ 1537.384809] 0 pages hwpoisoned
[ 1537.387902] Tasks state (memory values in pages):
[ 1537.392652] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1537.401356] [    201]   993   201     1130       72    45056        0             0 rpcbind
[ 1537.409772] [    202]     0   202     4529     1640    77824        0          -250 systemd-journal
[ 1537.418861] [    222]     0   222     4691      801    69632        0         -1000 systemd-udevd
[ 1537.427787] [    248]   994   248    20914      130    65536        0             0 systemd-timesyn
[ 1537.436884] [    497]     0   497      620       31    49152        0             0 atd
[ 1537.444938] [    500]     0   500      854       77    53248        0             0 crond
[ 1537.453165] [    503]   997   503     1470      160    49152        0          -900 dbus-daemon
[ 1537.461908] [    505]     0   505      633       24    40960        0             0 firmwared
[ 1537.470491] [    513]     0   513     2507      180    61440        0             0 ofonod
[ 1537.478800] [    514]   990   514    69640      137    81920        0             0 parsec
[ 1537.487120] [    533]     0   533      599       39    40960        0             0 syslogd
[ 1537.495518] [    534]     0   534     4546      148    65536        0             0 systemd-logind
[ 1537.504560] [    535]     0   535      690       24    45056        0             0 tee-supplicant
[ 1537.513564] [    540]   996   540     2769      168    61440        0             0 systemd-network
[ 1537.522680] [    566]     0   566     3878      228    77824        0             0 connmand
[ 1537.531168] [    645]   998   645     1538      133    57344        0             0 avahi-daemon
[ 1537.540004] [    646]   998   646     1461       64    57344        0             0 avahi-daemon
[ 1537.548846] [    648]   992   648      781       41    45056        0             0 rpc.statd
[ 1537.557415] [    650] 64371   650      590       23    45056        0             0 ninfod
[ 1537.565754] [    653] 61563   653      555       24    45056        0             0 rdisc
[ 1537.573971] [    655]     0   655   374569     2999   290816        0          -999 containerd
[ 1537.582621] [    658]     0   658     1311       20    49152        0             0 agetty
[ 1537.590922] [    663]     0   663     1529       97    49152        0             0 login
[ 1537.599138] [    666]     0   666     3430      202    69632        0             0 wpa_supplicant
[ 1537.608147] [    667]     0   667     2344       96    61440        0             0 systemd-userdbd
[ 1537.617240] [    677]     0   677     2964      314    65536        0           100 systemd
[ 1537.625651] [    679]     0   679     3720      646    73728        0           100 (sd-pam)
[ 1537.634138] [    687]     0   687     1289      403    45056        0             0 sh
[ 1537.642108] [    789]     0   789      970       93    45056        0             0 eth_test2.sh
[ 1537.650955] [   2355]     0  2355     2346       94    61440        0             0 systemd-userwor
[ 1537.660046] [   2356]     0  2356     2346       94    61440        0             0 systemd-userwor
[ 1537.669137] [   2358]     0  2358     2346       95    57344        0             0 systemd-userwor
[ 1537.678258] [   2379]     0  2379      970       93    45056        0             0 eth_test2.sh
[ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
[ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
[ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec

Fixes: 95698ff ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: shenwei wang <Shenwei.wang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
[ Upstream commit ca02549 ]

Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following
crash:

[   32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170
[   32.093115] Oops [#1]
[   32.093251] Modules linked in:
[   32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16
[   32.093985] Hardware name: riscv-virtio,qemu (DT)
[   32.094280] epc : ksys_read+0x88/0xd6
[   32.094855]  ra : ksys_read+0xc0/0xd6
[   32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0
[   32.095227]  gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80
[   32.095500]  t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60
[   32.095716]  s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708
[   32.095921]  a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300
[   32.096171]  a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff
[   32.096411]  s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170
[   32.096638]  s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030
[   32.096865]  s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410
[   32.097092]  s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c
[   32.097317]  t5 : ffffffff8000c29c t6 : ffffffff800dbc54
[   32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d
[   32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6
[   32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38
[   32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2

Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)',
hence it requires obtaining the target address from rs1 followed by
updating rd.

Fixes: c22b0bc ("riscv: Add kprobes supported")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com
[Palmer: Pick Guo's cleanup]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
commit fe0ba8c upstream.

Commit f1e5250 ("x86/boot: Skip realmode init code when running as
Xen PV guest") missed one code path accessing real_mode_header, leading
to dereferencing NULL when suspending the system under Xen:

    [  348.284004] PM: suspend entry (deep)
    [  348.289532] Filesystems sync: 0.005 seconds
    [  348.291545] Freezing user space processes ... (elapsed 0.000 seconds) done.
    [  348.292457] OOM killer disabled.
    [  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 seconds) done.
    [  348.396612] printk: Suspending console(s) (use no_console_suspend to debug)
    [  348.749228] PM: suspend devices took 0.352 seconds
    [  348.769713] ACPI: EC: interrupt blocked
    [  348.816077] BUG: kernel NULL pointer dereference, address: 000000000000001c
    [  348.816080] #PF: supervisor read access in kernel mode
    [  348.816081] #PF: error_code(0x0000) - not-present page
    [  348.816083] PGD 0 P4D 0
    [  348.816086] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 6.1.3-1.fc32.qubes.x86_64 #1
    [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 07/03/2022
    [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20

Fix that by adding an optional acpi callback allowing to skip setting
the wakeup address, as in the Xen PV case this will be handled by the
hypervisor anyway.

Fixes: f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20230117155724.22940-1-jgross%40suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
…lume

[ Upstream commit 8226e37 ]

The freeing of relinquished volume will wake up the pending volume
acquisition by using wake_up_bit(), however it is mismatched with
wait_var_event() used in fscache_wait_on_volume_collision() and it will
never wake up the waiter in the wait-queue because these two functions
operate on different wait-queues.

According to the implementation in fscache_wait_on_volume_collision(),
if the wake-up of pending acquisition is delayed longer than 20 seconds
(e.g., due to the delay of on-demand fd closing), the first
wait_var_event_timeout() will timeout and the following wait_var_event()
will hang forever as shown below:

 FS-Cache: Potential volume collision new=00000024 old=00000022
 ......
 INFO: task mount:1148 blocked for more than 122 seconds.
       Not tainted 6.1.0-rc6+ #1
 task:mount           state:D stack:0     pid:1148  ppid:1
 Call Trace:
  <TASK>
  __schedule+0x2f6/0xb80
  schedule+0x67/0xe0
  fscache_wait_on_volume_collision.cold+0x80/0x82
  __fscache_acquire_volume+0x40d/0x4e0
  erofs_fscache_register_volume+0x51/0xe0 [erofs]
  erofs_fscache_register_fs+0x19c/0x240 [erofs]
  erofs_fc_fill_super+0x746/0xaf0 [erofs]
  vfs_get_super+0x7d/0x100
  get_tree_nodev+0x16/0x20
  erofs_fc_get_tree+0x20/0x30 [erofs]
  vfs_get_tree+0x24/0xb0
  path_mount+0x2fa/0xa90
  do_mount+0x7c/0xa0
  __x64_sys_mount+0x8b/0xe0
  do_syscall_64+0x30/0x60
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Considering that wake_up_bit() is more selective, so fix it by using
wait_on_bit() instead of wait_var_event() to wait for the freeing of
relinquished volume. In addition because waitqueue_active() is used in
wake_up_bit() and clear_bit() doesn't imply any memory barrier, use
clear_and_wake_up_bit() to add the missing memory barrier between
cursor->flags and waitqueue_active().

Fixes: 62ab633 ("fscache: Implement volume registration")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20230113115211.2895845-2-houtao@huaweicloud.com/ # v3
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
… UAF

[ Upstream commit 226fae1 ]

After a call to console_unlock() in vcs_read() the vc_data struct can be
freed by vc_deallocate(). Because of that, the struct vc_data pointer
load must be done at the top of while loop in vcs_read() to avoid a UAF
when vcs_size() is called.

Syzkaller reported a UAF in vcs_size().

BUG: KASAN: use-after-free in vcs_size (drivers/tty/vt/vc_screen.c:215)
Read of size 4 at addr ffff8881137479a8 by task 4a005ed81e27e65/1537

CPU: 0 PID: 1537 Comm: 4a005ed81e27e65 Not tainted 6.2.0-rc5 #1
Hardware name: Red Hat KVM, BIOS 1.15.0-2.module
Call Trace:
  <TASK>
__asan_report_load4_noabort (mm/kasan/report_generic.c:350)
vcs_size (drivers/tty/vt/vc_screen.c:215)
vcs_read (drivers/tty/vt/vc_screen.c:415)
vfs_read (fs/read_write.c:468 fs/read_write.c:450)
...
  </TASK>

Allocated by task 1191:
...
kmalloc_trace (mm/slab_common.c:1069)
vc_allocate (./include/linux/slab.h:580 ./include/linux/slab.h:720
     drivers/tty/vt/vt.c:1128 drivers/tty/vt/vt.c:1108)
con_install (drivers/tty/vt/vt.c:3383)
tty_init_dev (drivers/tty/tty_io.c:1301 drivers/tty/tty_io.c:1413
     drivers/tty/tty_io.c:1390)
tty_open (drivers/tty/tty_io.c:2080 drivers/tty/tty_io.c:2126)
chrdev_open (fs/char_dev.c:415)
do_dentry_open (fs/open.c:883)
vfs_open (fs/open.c:1014)
...

Freed by task 1548:
...
kfree (mm/slab_common.c:1021)
vc_port_destruct (drivers/tty/vt/vt.c:1094)
tty_port_destructor (drivers/tty/tty_port.c:296)
tty_port_put (drivers/tty/tty_port.c:312)
vt_disallocate_all (drivers/tty/vt/vt_ioctl.c:662 (discriminator 2))
vt_ioctl (drivers/tty/vt/vt_ioctl.c:903)
tty_ioctl (drivers/tty/tty_io.c:2776)
...

The buggy address belongs to the object at ffff888113747800
  which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 424 bytes inside of
  1024-byte region [ffff888113747800, ffff888113747c00)

The buggy address belongs to the physical page:
page:00000000b3fe6c7c refcount:1 mapcount:0 mapping:0000000000000000
     index:0x0 pfn:0x113740
head:00000000b3fe6c7c order:3 compound_mapcount:0 subpages_mapcount:0
     compound_pincount:0
anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0010200 ffff888100042dc0 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff888113747880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888113747900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888113747980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                   ^
  ffff888113747a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888113747a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint

Fixes: ac751ef ("console: rename acquire/release_console_sem() to console_lock/unlock()")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Link: https://lore.kernel.org/r/1674577014-12374-1-git-send-email-george.kennedy@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
commit 98d0219 upstream.

If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.

Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.

Example crash with the kernel loaded at 5MB:

  Run /sbin/init as init process
  BUG: Unable to handle kernel data access on write at 0xc000000000452000
  Faulting instruction address: 0xc0000000005b6730
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 torvalds#166
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
  NIP:  c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
  REGS: c000000004503250 TRAP: 0300   Not tainted  (6.2.0-rc1-00011-g349188be4841)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44288480  XER: 00000000
  CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
  ...
  NIP memset+0x68/0x104
  LR  zero_user_segments.constprop.0+0xa8/0xf0
  Call Trace:
    ext4_mpage_readpages+0x7f8/0x830
    ext4_readahead+0x48/0x60
    read_pages+0xb8/0x380
    page_cache_ra_unbounded+0x19c/0x250
    filemap_fault+0x58c/0xae0
    __do_fault+0x60/0x100
    __handle_mm_fault+0x1230/0x1a40
    handle_mm_fault+0x120/0x300
    ___do_page_fault+0x20c/0xa80
    do_page_fault+0x30/0xc0
    data_access_common_virt+0x210/0x220

This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)

The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.

To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.

That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
  radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
  radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)

If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages

Fixes: c55d7b5 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 10, 2023
commit ad53db4 upstream.

The recent commit 76d588d ("powerpc/imc-pmu: Fix use of mutex in
IRQs disabled section") fixed warnings (and possible deadlocks) in the
IMC PMU driver by converting the locking to use spinlocks.

It also converted the init-time nest_init_lock to a spinlock, even
though it's not used at runtime in IRQ disabled sections or while
holding other spinlocks.

This leads to warnings such as:

  BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
  preempt_count: 1, expected: 0
  CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1
  Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV
  Call Trace:
    dump_stack_lvl+0x74/0xa8 (unreliable)
    __might_resched+0x178/0x1a0
    __cpuhp_setup_state+0x64/0x1e0
    init_imc_pmu+0xe48/0x1250
    opal_imc_counters_probe+0x30c/0x6a0
    platform_probe+0x78/0x110
    really_probe+0x104/0x420
    __driver_probe_device+0xb0/0x170
    driver_probe_device+0x58/0x180
    __driver_attach+0xd8/0x250
    bus_for_each_dev+0xb4/0x140
    driver_attach+0x34/0x50
    bus_add_driver+0x1e8/0x2d0
    driver_register+0xb4/0x1c0
    __platform_driver_register+0x38/0x50
    opal_imc_driver_init+0x2c/0x40
    do_one_initcall+0x80/0x360
    kernel_init_freeable+0x310/0x3b8
    kernel_init+0x30/0x1a0
    ret_from_kernel_thread+0x5c/0x64

Fix it by converting nest_init_lock back to a mutex, so that we can call
sleeping functions while holding it. There is no interaction between
nest_init_lock and the runtime spinlocks used by the actual PMU routines.

Fixes: 76d588d ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
Tested-by: Kajol Jain<kjain@linux.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 19, 2023
[ Upstream commit e632291 ]

The cited commit creates child PKEY interfaces over netlink will
multiple tx and rx queues, but some devices doesn't support more than 1
tx and 1 rx queues. This causes to a crash when traffic is sent over the
PKEY interface due to the parent having a single queue but the child
having multiple queues.

This patch fixes the number of queues to 1 for legacy IPoIB at the
earliest possible point in time.

BUG: kernel NULL pointer dereference, address: 000000000000036b
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc+0xcb/0x450
Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a
01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b
RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202
RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae
RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00
RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40
R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000
R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000
FS:  00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 skb_clone+0x55/0xd0
 ip6_finish_output2+0x3fe/0x690
 ip6_finish_output+0xfa/0x310
 ip6_send_skb+0x1e/0x60
 udp_v6_send_skb+0x1e5/0x420
 udpv6_sendmsg+0xb3c/0xe60
 ? ip_mc_finish_output+0x180/0x180
 ? __switch_to_asm+0x3a/0x60
 ? __switch_to_asm+0x34/0x60
 sock_sendmsg+0x33/0x40
 __sys_sendto+0x103/0x160
 ? _copy_to_user+0x21/0x30
 ? kvm_clock_get_cycles+0xd/0x10
 ? ktime_get_ts64+0x49/0xe0
 __x64_sys_sendto+0x25/0x30
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f9374f1ed14
Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b
7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b
RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14
RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030
RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc
 </TASK>

Fixes: dbc94a0 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.1674584576.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 19, 2023
[ Upstream commit 4d159f7 ]

When both ice and the irdma driver are loaded, a warning in
check_flush_dependency is being triggered. This is due to ice driver
workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
is not.

According to kernel documentation, this flag should be set if the
workqueue will be involved in the kernel's memory reclamation flow.
Since it is not, there is no need for the ice driver's WQ to have this
flag set so remove it.

Example trace:

[  +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
[  +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
[  +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[  +0.000161]  [last unloaded: bonding]
[  +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S                 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
[  +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  +0.000003] Workqueue: ice ice_service_task [ice]
[  +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
[  +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
[  +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
[  +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
[  +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
[  +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
[  +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
[  +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
[  +0.000004] FS:  0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
[  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
[  +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.000002] PKRU: 55555554
[  +0.000003] Call Trace:
[  +0.000002]  <TASK>
[  +0.000003]  __flush_workqueue+0x203/0x840
[  +0.000006]  ? mutex_unlock+0x84/0xd0
[  +0.000008]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? __pfx___flush_workqueue+0x10/0x10
[  +0.000006]  ? mutex_lock+0xa3/0xf0
[  +0.000005]  ib_cache_cleanup_one+0x39/0x190 [ib_core]
[  +0.000174]  __ib_unregister_device+0x84/0xf0 [ib_core]
[  +0.000094]  ib_unregister_device+0x25/0x30 [ib_core]
[  +0.000093]  irdma_ib_unregister_device+0x97/0xc0 [irdma]
[  +0.000064]  ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
[  +0.000059]  ? up_write+0x5c/0x90
[  +0.000005]  irdma_remove+0x36/0x90 [irdma]
[  +0.000062]  auxiliary_bus_remove+0x32/0x50
[  +0.000007]  device_release_driver_internal+0xfa/0x1c0
[  +0.000005]  bus_remove_device+0x18a/0x260
[  +0.000007]  device_del+0x2e5/0x650
[  +0.000005]  ? __pfx_device_del+0x10/0x10
[  +0.000003]  ? mutex_unlock+0x84/0xd0
[  +0.000004]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? _raw_spin_unlock+0x18/0x40
[  +0.000005]  ice_unplug_aux_dev+0x52/0x70 [ice]
[  +0.000160]  ice_service_task+0x1309/0x14f0 [ice]
[  +0.000134]  ? __pfx___schedule+0x10/0x10
[  +0.000006]  process_one_work+0x3b1/0x6c0
[  +0.000008]  worker_thread+0x69/0x670
[  +0.000005]  ? __kthread_parkme+0xec/0x110
[  +0.000007]  ? __pfx_worker_thread+0x10/0x10
[  +0.000005]  kthread+0x17f/0x1b0
[  +0.000005]  ? __pfx_kthread+0x10/0x10
[  +0.000004]  ret_from_fork+0x29/0x50
[  +0.000009]  </TASK>

Fixes: 940b61a ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Feb 19, 2023
commit 462a8e0 upstream.

When we upgraded our kernel, we started seeing some page corruption like
the following consistently:

  BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
  page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
  flags: 0x17ffffc0000000()
  raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
  raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
  page dumped because: nonzero mapcount
  CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
  Call Trace:
   dump_stack+0x74/0x96
   bad_page.cold+0x63/0x94
   check_new_page_bad+0x6d/0x80
   rmqueue+0x46e/0x970
   get_page_from_freelist+0xcb/0x3f0
   ? _cond_resched+0x19/0x40
   __alloc_pages_nodemask+0x164/0x300
   alloc_pages_current+0x87/0xf0
   skb_page_frag_refill+0x84/0x110
   ...

Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.

After bisecting the issue, we found the issue started from commit
e320d30 ("mm/page_alloc.c: fix freeing non-compound pages"):

	if (put_page_testzero(page))
		free_the_page(page, order);
	else if (!PageHead(page))
		while (order-- > 0)
			free_the_page(page + (1 << order), order);

So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page.  So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.

Fixes: e320d30 ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 19, 2023
commit 5ad7bbf upstream.

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli torvalds#338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 <TASK>
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 23, 2023
commit 55d77ba upstream.

On powerpc64, you can build a kernel with KASAN as soon as you build it
with RADIX MMU support.  However if the CPU doesn't have RADIX MMU, KASAN
isn't enabled at init and the following Oops is encountered.

  [    0.000000][    T0] KASAN not enabled as it requires radix!

  [    4.484295][   T26] BUG: Unable to handle kernel data access at 0xc00e000000804a04
  [    4.485270][   T26] Faulting instruction address: 0xc00000000062ec6c
  [    4.485748][   T26] Oops: Kernel access of bad area, sig: 11 [#1]
  [    4.485920][   T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  [    4.486259][   T26] Modules linked in:
  [    4.486637][   T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 6.2.0-rc3-02590-gf8a023b0a805 torvalds#249
  [    4.486907][   T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,HEAD pSeries
  [    4.487445][   T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func
  [    4.488744][   T26] NIP:  c00000000062ec6c LR: c00000000062bb84 CTR: c0000000002ebcd0
  [    4.488867][   T26] REGS: c0000000049175c0 TRAP: 0380   Not tainted  (6.2.0-rc3-02590-gf8a023b0a805)
  [    4.489028][   T26] MSR:  8000000002009032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 44002808  XER: 00000000
  [    4.489584][   T26] CFAR: c00000000062bb80 IRQMASK: 0
  [    4.489584][   T26] GPR00: c0000000005624d4 c000000004917860 c000000001cfc000 1800000000804a04
  [    4.489584][   T26] GPR04: c0000000003a2650 0000000000000cc0 c00000000000d3d8 c00000000000d3d8
  [    4.489584][   T26] GPR08: c0000000049175b0 a80e000000000000 0000000000000000 0000000017d78400
  [    4.489584][   T26] GPR12: 0000000044002204 c000000003790000 c00000000435003c c0000000043f1c40
  [    4.489584][   T26] GPR16: c0000000043f1c68 c0000000043501a0 c000000002106138 c0000000043f1c08
  [    4.489584][   T26] GPR20: c0000000043f1c10 c0000000043f1c20 c000000004146c40 c000000002fdb7f8
  [    4.489584][   T26] GPR24: c000000002fdb834 c000000003685e00 c000000004025030 c000000003522e90
  [    4.489584][   T26] GPR28: 0000000000000cc0 c0000000003a2650 c000000004025020 c000000004025020
  [    4.491201][   T26] NIP [c00000000062ec6c] .kasan_byte_accessible+0xc/0x20
  [    4.491430][   T26] LR [c00000000062bb84] .__kasan_check_byte+0x24/0x90
  [    4.491767][   T26] Call Trace:
  [    4.491941][   T26] [c000000004917860] [c00000000062ae70] .__kasan_kmalloc+0xc0/0x110 (unreliable)
  [    4.492270][   T26] [c0000000049178f0] [c0000000005624d4] .krealloc+0x54/0x1c0
  [    4.492453][   T26] [c000000004917990] [c0000000003a2650] .create_trace_option_files+0x280/0x530
  [    4.492613][   T26] [c000000004917a90] [c000000002050d90] .tracer_init_tracefs_work_func+0x274/0x2c0
  [    4.492771][   T26] [c000000004917b40] [c0000000001f9948] .process_one_work+0x578/0x9f0
  [    4.492927][   T26] [c000000004917c30] [c0000000001f9ebc] .worker_thread+0xfc/0x950
  [    4.493084][   T26] [c000000004917d60] [c00000000020be84] .kthread+0x1a4/0x1b0
  [    4.493232][   T26] [c000000004917e10] [c00000000000d3d8] .ret_from_kernel_thread+0x58/0x60
  [    4.495642][   T26] Code: 60000000 7cc802a6 38a00000 4bfffc78 60000000 7cc802a6 38a00001 4bfffc68 60000000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 20630007 78630fe0 68630001
  [    4.496704][   T26] ---[ end trace 0000000000000000 ]---

The Oops is due to kasan_byte_accessible() not checking the readiness of
KASAN.  Add missing call to kasan_arch_is_ready() and bail out when not
ready.  The same problem is observed with ____kasan_kfree_large() so fix
it the same.

Also, as KASAN is not available and no shadow area is allocated for linear
memory mapping, there is no point in allocating shadow mem for vmalloc
memory as shown below in /sys/kernel/debug/kernel_page_tables

  ---[ kasan shadow mem start ]---
  0xc00f000000000000-0xc00f00000006ffff  0x00000000040f0000       448K         r  w       pte  valid  present        dirty  accessed
  0xc00f000000860000-0xc00f00000086ffff  0x000000000ac10000        64K         r  w       pte  valid  present        dirty  accessed
  0xc00f3ffffffe0000-0xc00f3fffffffffff  0x0000000004d10000       128K         r  w       pte  valid  present        dirty  accessed
  ---[ kasan shadow mem end ]---

So, also verify KASAN readiness before allocating and poisoning
shadow mem for VMAs.

Link: https://lkml.kernel.org/r/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.leroy@csgroup.eu
Fixes: 41b7a34 ("powerpc: Book3S 64-bit outline-only KASAN support")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Nathan Lynch <nathanl@linux.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>	[5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 23, 2023
commit 2558b80 upstream.

syzbot found arm64 builds would crash in sock_recv_mark()
when CONFIG_HARDENED_USERCOPY=y

x86 and powerpc are not detecting the issue because
they define user_access_begin.
This will be handled in a different patch,
because a check_object_size() is missing.

Only data from skb->cb[] can be copied directly to/from user space,
as explained in commit 79a8a64 ("net: Whitelist
the skbuff_head_cache "cb" field")

syzbot report was:
usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 168, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:102 !
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 4410 Comm: syz-executor533 Not tainted 6.2.0-rc7-syzkaller-17907-g2d3827b3f393 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:90
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:90
sp : ffff80000fb9b9a0
x29: ffff80000fb9b9b0 x28: ffff0000c6073400 x27: 0000000020001a00
x26: 0000000000000014 x25: ffff80000cf52000 x24: fffffc0000000000
x23: 05ffc00000000200 x22: fffffc000324bf80 x21: ffff0000c92fe1a8
x20: 0000000000000001 x19: 0000000000000004 x18: 0000000000000000
x17: 656a626f2042554c x16: ffff0000c6073dd0 x15: ffff80000dbd2118
x14: ffff0000c6073400 x13: 00000000ffffffff x12: ffff0000c6073400
x11: ff808000081bbb4c x10: 0000000000000000 x9 : 7b0572d7cc0ccf00
x8 : 7b0572d7cc0ccf00 x7 : ffff80000bf650d4 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000
x2 : ffff0001fefbff08 x1 : 0000000100000000 x0 : 000000000000006c
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:90
__check_heap_object+0xa8/0x100 mm/slub.c:4761
check_heap_object mm/usercopy.c:196 [inline]
__check_object_size+0x208/0x6b8 mm/usercopy.c:251
check_object_size include/linux/thread_info.h:199 [inline]
__copy_to_user include/linux/uaccess.h:115 [inline]
put_cmsg+0x408/0x464 net/core/scm.c:238
sock_recv_mark net/socket.c:975 [inline]
__sock_recv_cmsgs+0x1fc/0x248 net/socket.c:984
sock_recv_cmsgs include/net/sock.h:2728 [inline]
packet_recvmsg+0x2d8/0x678 net/packet/af_packet.c:3482
____sys_recvmsg+0x110/0x3a0
___sys_recvmsg net/socket.c:2737 [inline]
__sys_recvmsg+0x194/0x210 net/socket.c:2767
__do_sys_recvmsg net/socket.c:2777 [inline]
__se_sys_recvmsg net/socket.c:2774 [inline]
__arm64_sys_recvmsg+0x2c/0x3c net/socket.c:2774
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x64/0x178 arch/arm64/kernel/syscall.c:52
el0_svc_common+0xbc/0x180 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x110 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x14c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: 91388800 aa0903e1 f90003e8 94e6d752 (d4210000)

Fixes: 6fd1d51 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erin MacNeil <lnx.erin@gmail.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230213160059.3829741-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Feb 23, 2023
[ Upstream commit 42018a3 ]

Syzkaller found an issue where a handle greater than 16 bits would trigger
a null-ptr-deref in the imperfect hash area update.

general protection fault, probably for non-canonical address
0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af]
CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted
6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/21/2023
RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509
Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb
a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c
02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00
RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8
RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8
R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00
FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572
tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155
rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x334/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmmsg+0x18f/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80

Fixes: ee05917 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Mar 9, 2023
There is a lock inversion and rwsem read-lock recursion in the devfreq
target callback which can lead to deadlocks.

Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
read lock when toggling the write booster, which involves taking the
dev_cmd mutex before taking another clk_scaling_lock read lock.

This can lead to a deadlock if another thread:

  1) tries to acquire the dev_cmd and clk_scaling locks in the correct
     order, or

  2) takes a clk_scaling write lock before the attempt to take the
     clk_scaling read lock a second time.

Fix this by dropping the clk_scaling_lock before toggling the write booster
as was done before commit 0e9d4ca ("scsi: ufs: Protect some contexts
from unexpected clock scaling").

While the devfreq callbacks are already serialised, add a second
serialising mutex to handle the unlikely case where a callback triggered
through the devfreq sysfs interface is racing with a request to disable
clock scaling through the UFS controller 'clkscale_enable' sysfs
attribute. This could otherwise lead to the write booster being left
disabled after having disabled clock scaling.

Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any
pending write booster update has completed on return.

Note that this currently only affects Qualcomm platforms since commit
87bd050 ("scsi: ufs: core: Allow host driver to disable wb toggling
during clock scaling").

The lock inversion (i.e. 1 above) was reported by lockdep as:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.1.0-next-20221216 torvalds#211 Not tainted
 ------------------------------------------------------
 kworker/u16:2/71 is trying to acquire lock:
 ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0

 but task is already holding lock:
 ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380

 which lock already depends on the new lock.
[  +0.011606]
 the existing dependency chain (in reverse order) is:

 -> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
        lock_acquire+0x68/0x90
        down_read+0x58/0x80
        ufshcd_exec_dev_cmd+0x70/0x2c0
        ufshcd_verify_dev_init+0x68/0x170
        ufshcd_probe_hba+0x398/0x1180
        ufshcd_async_scan+0x30/0x320
        async_run_entry_fn+0x34/0x150
        process_one_work+0x288/0x6c0
        worker_thread+0x74/0x450
        kthread+0x118/0x120
        ret_from_fork+0x10/0x20

 -> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
        __lock_acquire+0x12a0/0x2240
        lock_acquire.part.0+0xcc/0x220
        lock_acquire+0x68/0x90
        __mutex_lock+0x98/0x430
        mutex_lock_nested+0x2c/0x40
        ufshcd_query_flag+0x50/0x1c0
        ufshcd_query_flag_retry+0x64/0x100
        ufshcd_wb_toggle+0x5c/0x120
        ufshcd_devfreq_scale+0x2c4/0x380
        ufshcd_devfreq_target+0xf4/0x230
        devfreq_set_target+0x84/0x2f0
        devfreq_update_target+0xc4/0xf0
        devfreq_monitor+0x38/0x1f0
        process_one_work+0x288/0x6c0
        worker_thread+0x74/0x450
        kthread+0x118/0x120
        ret_from_fork+0x10/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0                    CPU1
        ----                    ----
   lock(&hba->clk_scaling_lock);
                                lock(&hba->dev_cmd.lock);
                                lock(&hba->clk_scaling_lock);
   lock(&hba->dev_cmd.lock);

  *** DEADLOCK ***

Fixes: 0e9d4ca ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Cc: stable@vger.kernel.org      # 5.12
Cc: Can Guo <quic_cang@quicinc.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
chewitt pushed a commit that referenced this pull request Mar 9, 2023
Commit f1e5250 ("x86/boot: Skip realmode init code when running as
Xen PV guest") missed one code path accessing real_mode_header, leading
to dereferencing NULL when suspending the system under Xen:

    [  348.284004] PM: suspend entry (deep)
    [  348.289532] Filesystems sync: 0.005 seconds
    [  348.291545] Freezing user space processes ... (elapsed 0.000 seconds) done.
    [  348.292457] OOM killer disabled.
    [  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 seconds) done.
    [  348.396612] printk: Suspending console(s) (use no_console_suspend to debug)
    [  348.749228] PM: suspend devices took 0.352 seconds
    [  348.769713] ACPI: EC: interrupt blocked
    [  348.816077] BUG: kernel NULL pointer dereference, address: 000000000000001c
    [  348.816080] #PF: supervisor read access in kernel mode
    [  348.816081] #PF: error_code(0x0000) - not-present page
    [  348.816083] PGD 0 P4D 0
    [  348.816086] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 6.1.3-1.fc32.qubes.x86_64 #1
    [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 07/03/2022
    [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20

Fix that by adding an optional acpi callback allowing to skip setting
the wakeup address, as in the Xen PV case this will be handled by the
hypervisor anyway.

Fixes: f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20230117155724.22940-1-jgross%40suse.com
chewitt pushed a commit that referenced this pull request Mar 9, 2023
The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).

Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
    echo "TINFO:ENET $curface up and down test $i times"
    org_macaddr=$(cat /sys/class/net/eth0/address)
    ifconfig eth0 down
    ifconfig eth0  hw ether $org_macaddr up
    i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
    echo "TEST PASS"
else
    echo "TEST FAIL"
fi

Bug detail logs:
TINFO:ENET  up and down test 391 times
[  850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[  853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[  870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
[  931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
TINFO:ENET  up and down test 392 times
[  991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
[ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
[ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
[ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
[ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
[ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
[ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
TINFO:ENET  up and down test 393 times
[ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
[ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
[ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
[ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
[ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
[ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
[ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
[ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
[ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
[ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
[ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
TINFO:ENET  up and down test 394 times
[ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
[ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
[ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
[ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G         C         6.1.1+g56321e101aca #1
[ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 1537.115561] Call trace:
[ 1537.118005]  dump_backtrace.part.0+0xe0/0xf0
[ 1537.122289]  show_stack+0x18/0x40
[ 1537.125608]  dump_stack_lvl+0x64/0x80
[ 1537.129276]  dump_stack+0x18/0x34
[ 1537.132592]  dump_header+0x44/0x208
[ 1537.136083]  oom_kill_process+0x2b4/0x2c0
[ 1537.140097]  out_of_memory+0xe4/0x594
[ 1537.143766]  __alloc_pages+0xb68/0xd00
[ 1537.147521]  alloc_pages+0xac/0x160
[ 1537.151013]  __get_free_pages+0x14/0x40
[ 1537.154851]  pgd_alloc+0x1c/0x30
[ 1537.158082]  mm_init+0xf8/0x1d0
[ 1537.161228]  mm_alloc+0x48/0x60
[ 1537.164368]  alloc_bprm+0x7c/0x240
[ 1537.167777]  do_execveat_common.isra.0+0x70/0x240
[ 1537.172486]  __arm64_sys_execve+0x40/0x54
[ 1537.176502]  invoke_syscall+0x48/0x114
[ 1537.180255]  el0_svc_common.constprop.0+0xcc/0xec
[ 1537.184964]  do_el0_svc+0x2c/0xd0
[ 1537.188280]  el0_svc+0x2c/0x84
[ 1537.191340]  el0t_64_sync_handler+0xf4/0x120
[ 1537.195613]  el0t_64_sync+0x18c/0x190
[ 1537.199334] Mem-Info:
[ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
[ 1537.201620]  active_file:54 inactive_file:112 isolated_file:0
[ 1537.201620]  unevictable:0 dirty:0 writeback:0
[ 1537.201620]  slab_reclaimable:2620 slab_unreclaimable:7076
[ 1537.201620]  mapped:1489 shmem:2473 pagetables:466
[ 1537.201620]  sec_pagetables:0 bounce:0
[ 1537.201620]  kernel_misc_reclaimable:0
[ 1537.201620]  free:136672 free_pcp:96 free_cma:129241
[ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
[ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
[ 1537.300219] lowmem_reserve[]: 0 0 0 0
[ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
[ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1537.358087] 2939 total pagecache pages
[ 1537.361876] 0 pages in swap cache
[ 1537.365229] Free swap  = 0kB
[ 1537.368147] Total swap = 0kB
[ 1537.371065] 516096 pages RAM
[ 1537.373959] 0 pages HighMem/MovableOnly
[ 1537.377834] 17302 pages reserved
[ 1537.381103] 163840 pages cma reserved
[ 1537.384809] 0 pages hwpoisoned
[ 1537.387902] Tasks state (memory values in pages):
[ 1537.392652] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1537.401356] [    201]   993   201     1130       72    45056        0             0 rpcbind
[ 1537.409772] [    202]     0   202     4529     1640    77824        0          -250 systemd-journal
[ 1537.418861] [    222]     0   222     4691      801    69632        0         -1000 systemd-udevd
[ 1537.427787] [    248]   994   248    20914      130    65536        0             0 systemd-timesyn
[ 1537.436884] [    497]     0   497      620       31    49152        0             0 atd
[ 1537.444938] [    500]     0   500      854       77    53248        0             0 crond
[ 1537.453165] [    503]   997   503     1470      160    49152        0          -900 dbus-daemon
[ 1537.461908] [    505]     0   505      633       24    40960        0             0 firmwared
[ 1537.470491] [    513]     0   513     2507      180    61440        0             0 ofonod
[ 1537.478800] [    514]   990   514    69640      137    81920        0             0 parsec
[ 1537.487120] [    533]     0   533      599       39    40960        0             0 syslogd
[ 1537.495518] [    534]     0   534     4546      148    65536        0             0 systemd-logind
[ 1537.504560] [    535]     0   535      690       24    45056        0             0 tee-supplicant
[ 1537.513564] [    540]   996   540     2769      168    61440        0             0 systemd-network
[ 1537.522680] [    566]     0   566     3878      228    77824        0             0 connmand
[ 1537.531168] [    645]   998   645     1538      133    57344        0             0 avahi-daemon
[ 1537.540004] [    646]   998   646     1461       64    57344        0             0 avahi-daemon
[ 1537.548846] [    648]   992   648      781       41    45056        0             0 rpc.statd
[ 1537.557415] [    650] 64371   650      590       23    45056        0             0 ninfod
[ 1537.565754] [    653] 61563   653      555       24    45056        0             0 rdisc
[ 1537.573971] [    655]     0   655   374569     2999   290816        0          -999 containerd
[ 1537.582621] [    658]     0   658     1311       20    49152        0             0 agetty
[ 1537.590922] [    663]     0   663     1529       97    49152        0             0 login
[ 1537.599138] [    666]     0   666     3430      202    69632        0             0 wpa_supplicant
[ 1537.608147] [    667]     0   667     2344       96    61440        0             0 systemd-userdbd
[ 1537.617240] [    677]     0   677     2964      314    65536        0           100 systemd
[ 1537.625651] [    679]     0   679     3720      646    73728        0           100 (sd-pam)
[ 1537.634138] [    687]     0   687     1289      403    45056        0             0 sh
[ 1537.642108] [    789]     0   789      970       93    45056        0             0 eth_test2.sh
[ 1537.650955] [   2355]     0  2355     2346       94    61440        0             0 systemd-userwor
[ 1537.660046] [   2356]     0  2356     2346       94    61440        0             0 systemd-userwor
[ 1537.669137] [   2358]     0  2358     2346       95    57344        0             0 systemd-userwor
[ 1537.678258] [   2379]     0  2379      970       93    45056        0             0 eth_test2.sh
[ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
[ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
[ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec

Fixes: 95698ff ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: shenwei wang <Shenwei.wang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
chewitt pushed a commit that referenced this pull request Mar 9, 2023
Jakub Sitnicki says:

====================

This patch set addresses the syzbot report in [1].

Patch #1 has been suggested by Eric [2]. I extended it to cover the rest of
sock_map proto callbacks. Otherwise we would still overflow the stack.

Patch #2 contains the actual fix and bug analysis.
Patches #3 & #4 add coverage to selftests to trigger the bug.

[1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/
[2] https://lore.kernel.org/all/CANn89iK2UN1FmdUcH12fv_xiZkv2G+Nskvmq7fG6aA_6VKRf6g@mail.gmail.com/
---
v1 -> v2:
v1: https://lore.kernel.org/r/20230113-sockmap-fix-v1-0-d3cad092ee10@cloudflare.com
[v1 didn't hit bpf@ ML by mistake]

 * pull in Eric's patch to protect against recursion loop bugs (Eric)
 * add a macro helper to check if pointer is inside a memory range (Eric)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
chewitt pushed a commit that referenced this pull request Mar 9, 2023
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following
crash:

[   32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170
[   32.093115] Oops [#1]
[   32.093251] Modules linked in:
[   32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16
[   32.093985] Hardware name: riscv-virtio,qemu (DT)
[   32.094280] epc : ksys_read+0x88/0xd6
[   32.094855]  ra : ksys_read+0xc0/0xd6
[   32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0
[   32.095227]  gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80
[   32.095500]  t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60
[   32.095716]  s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708
[   32.095921]  a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300
[   32.096171]  a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff
[   32.096411]  s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170
[   32.096638]  s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030
[   32.096865]  s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410
[   32.097092]  s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c
[   32.097317]  t5 : ffffffff8000c29c t6 : ffffffff800dbc54
[   32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d
[   32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6
[   32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38
[   32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2

Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)',
hence it requires obtaining the target address from rs1 followed by
updating rd.

Fixes: c22b0bc ("riscv: Add kprobes supported")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com
[Palmer: Pick Guo's cleanup]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
chewitt pushed a commit that referenced this pull request Dec 1, 2025
…ptcp_do_fastclose().

syzbot reported divide-by-zero in __tcp_select_window() by
MPTCP socket. [0]

We had a similar issue for the bare TCP and fixed in commit
499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead
of 0").

Let's apply the same fix to mptcp_do_fastclose().

[0]:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336
Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00
RSP: 0018:ffffc90003017640 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028
R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 tcp_select_window net/ipv4/tcp_output.c:281 [inline]
 __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568
 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline]
 tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836
 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793
 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253
 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776
 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe5/0x270 net/socket.c:742
 __sys_sendto+0x3bd/0x520 net/socket.c:2244
 __do_sys_sendto net/socket.c:2251 [inline]
 __se_sys_sendto net/socket.c:2247 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2247
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f66e998f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006
 </TASK>

Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose")
Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Dec 10, 2025
…3576

When I added command queue engine (CQE) support for the Rockchip eMMC
controller, I missed that RK3576 has a separate platform data struct.
While things are working fine on RK3588 (I tested the ROCK 5B) and
the suspend issue is fixed on the RK3576 (I tested the Sige5), this
results in stability issues. By also adding the necessary hooks for
the RK3576 platform the following problems can be avoided:

[   15.606895] mmc0: running CQE recovery
[   15.616189] mmc0: running CQE recovery
[...]
[   25.911484] mmc0: running CQE recovery
[   25.926305] mmc0: running CQE recovery
[   25.927468] mmc0: running CQE recovery
[...]
[   26.255719] mmc0: running CQE recovery
[   26.257162] ------------[ cut here ]------------
[   26.257581] mmc0: cqhci: spurious TCN for tag 31
[   26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c
[   26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT
[   26.264561] Hardware name: ArmSoM Sige5 (DT)
[...]
[   26.272748] Call trace:
[   26.272964]  cqhci_irq+0x440/0x68c (P)
[   26.273296]  dwcmshc_cqe_irq_handler+0x54/0x88
[   26.273689]  sdhci_irq+0xbc/0x1200
[   26.273991]  __handle_irq_event_percpu+0x54/0x1d0
[...]

Note that the above problems do not necessarily happen with every boot.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/
Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
chewitt pushed a commit that referenced this pull request Dec 15, 2025
…em corrupted

commit 986835b upstream.

There's issue when file system corrupted:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1289!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next
RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0
RSP: 0018:ffff888117aafa30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534
RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010
RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __ext4_journal_get_create_access+0x42/0x170
 ext4_getblk+0x319/0x6f0
 ext4_bread+0x11/0x100
 ext4_append+0x1e6/0x4a0
 ext4_init_new_dir+0x145/0x1d0
 ext4_mkdir+0x326/0x920
 vfs_mkdir+0x45c/0x740
 do_mkdirat+0x234/0x2f0
 __x64_sys_mkdir+0xd6/0x120
 do_syscall_64+0x5f/0xfa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above issue occurs with us in errors=continue mode when accompanied by
storage failures. There have been many inconsistencies in the file system
data.
In the case of file system data inconsistency, for example, if the block
bitmap of a referenced block is not set, it can lead to the situation where
a block being committed is allocated and used again. As a result, the
following condition will not be satisfied then trigger BUG_ON. Of course,
it is entirely possible to construct a problematic image that can trigger
this BUG_ON through specific operations. In fact, I have constructed such
an image and easily reproduced this issue.
Therefore, J_ASSERT() holds true only under ideal conditions, but it may
not necessarily be satisfied in exceptional scenarios. Using J_ASSERT()
directly in abnormal situations would cause the system to crash, which is
clearly not what we want. So here we directly trigger a JBD abort instead
of immediately invoking BUG_ON.

Fixes: 470decc ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Dec 15, 2025
commit 0cd8fee upstream.

Fix a race between inline data destruction and block mapping.

The function ext4_destroy_inline_data_nolock() changes the inode data
layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS.
At the same time, another thread may execute ext4_map_blocks(), which
tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks()
or ext4_ind_map_blocks().

Without i_data_sem protection, ext4_ind_map_blocks() may receive inode
with EXT4_INODE_EXTENTS flag and triggering assert.

kernel BUG at fs/ext4/indirect.c:546!
EXT4-fs (loop2): unmounting filesystem.
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546

Call Trace:
 <TASK>
 ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681
 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822
 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124
 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255
 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000
 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846
 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285
 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679
 call_write_iter include/linux/fs.h:2271 [inline]
 do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735
 do_iter_write+0x186/0x710 fs/read_write.c:861
 vfs_iter_write+0x70/0xa0 fs/read_write.c:902
 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685
 do_splice_from fs/splice.c:763 [inline]
 direct_splice_actor+0x10f/0x170 fs/splice.c:950
 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896
 do_splice_direct+0x1a9/0x280 fs/splice.c:1002
 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255
 __do_sys_sendfile64 fs/read_write.c:1323 [inline]
 __se_sys_sendfile64 fs/read_write.c:1309 [inline]
 __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: c755e25 ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Message-ID: <20251104093326.697381-1-sdl@nppct.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Dec 15, 2025
commit 3e0ae02 upstream.

Rust Binder contains the following unsafe operation:

	// SAFETY: A `NodeDeath` is never inserted into the death list
	// of any node other than its owner, so it is either in this
	// death list or in no death list.
	unsafe { node_inner.death_list.remove(self) };

This operation is unsafe because when touching the prev/next pointers of
a list element, we have to ensure that no other thread is also touching
them in parallel. If the node is present in the list that `remove` is
called on, then that is fine because we have exclusive access to that
list. If the node is not in any list, then it's also ok. But if it's
present in a different list that may be accessed in parallel, then that
may be a data race on the prev/next pointers.

And unfortunately that is exactly what is happening here. In
Node::release, we:

 1. Take the lock.
 2. Move all items to a local list on the stack.
 3. Drop the lock.
 4. Iterate the local list on the stack.

Combined with threads using the unsafe remove method on the original
list, this leads to memory corruption of the prev/next pointers. This
leads to crashes like this one:

	Unable to handle kernel paging request at virtual address 000bb9841bcac70e
	Mem abort info:
	  ESR = 0x0000000096000044
	  EC = 0x25: DABT (current EL), IL = 32 bits
	  SET = 0, FnV = 0
	  EA = 0, S1PTW = 0
	  FSC = 0x04: level 0 translation fault
	Data abort info:
	  ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
	  CM = 0, WnR = 1, TnD = 0, TagAccess = 0
	  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
	[000bb9841bcac70e] address between user and kernel address ranges
	Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
	google-cdd 538c004.gcdd: context saved(CPU:1)
	item - log_kevents is disabled
	Modules linked in: ... rust_binder
	CPU: 1 UID: 0 PID: 2092 Comm: kworker/1:178 Tainted: G S      W  OE      6.12.52-android16-5-g98debd5df505-4k #1 f94a6367396c5488d635708e43ee0c888d230b0b
	Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
	Hardware name: MUSTANG PVT 1.0 based on LGA (DT)
	Workqueue: events _RNvXs6_NtCsdfZWD8DztAw_6kernel9workqueueINtNtNtB7_4sync3arc3ArcNtNtCs8QPsHWIn21X_16rust_binder_main7process7ProcessEINtB5_15WorkItemPointerKy0_E3runB13_ [rust_binder]
	pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
	pc : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder]
	lr : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x464/0x11f8 [rust_binder]
	sp : ffffffc09b433ac0
	x29: ffffffc09b433d30 x28: ffffff8821690000 x27: ffffffd40cbaa448
	x26: ffffff8821690000 x25: 00000000ffffffff x24: ffffff88d0376578
	x23: 0000000000000001 x22: ffffffc09b433c78 x21: ffffff88e8f9bf40
	x20: ffffff88e8f9bf40 x19: ffffff882692b000 x18: ffffffd40f10bf00
	x17: 00000000c006287d x16: 00000000c006287d x15: 00000000000003b0
	x14: 0000000000000100 x13: 000000201cb79ae0 x12: fffffffffffffff0
	x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000
	x8 : b80bb9841bcac706 x7 : 0000000000000001 x6 : fffffffebee63f30
	x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
	x2 : 0000000000004c31 x1 : ffffff88216900c0 x0 : ffffff88e8f9bf00
	Call trace:
	 _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder bbc172b53665bbc815363b22e97e3f7e3fe971fc]
	 process_scheduled_works+0x1c4/0x45c
	 worker_thread+0x32c/0x3e8
	 kthread+0x11c/0x1c8
	 ret_from_fork+0x10/0x20
	Code: 94218d85 b4000155 a94026a8 d10102a0 (f9000509)
	---[ end trace 0000000000000000 ]---

Thus, modify Node::release to pop items directly off the original list.

Cc: stable@vger.kernel.org
Fixes: eafedbc ("rust_binder: add Rust Binder driver")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20251111-binder-fix-list-remove-v1-1-8ed14a0da63d@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Dec 15, 2025
commit a51f025 upstream.

Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from
the fact that in case of early device detach via pcl818_detach(),
subdevice dev->read_subdev may not have initialized its pointer to
&struct comedi_async as intended. Thus, any such dereferencing of
&s->async->cmd will lead to general protection fault and kernel crash.

Mitigate this problem by removing a call to pcl818_ai_cancel() from
pcl818_detach() altogether. This way, if the subdevice setups its
support for async commands, everything async-related will be
handled via subdevice's own ->cancel() function in
comedi_device_detach_locked() even before pcl818_detach(). If no
support for asynchronous commands is provided, there is no need
to cancel anything either.

[1] Syzbot crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762
...
Call Trace:
 <TASK>
 pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115
 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207
 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline]
 comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
...

Reported-by: syzbot+fce5d9d5bd067d6fbe9b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b
Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data")
Cc: stable <stable@kernel.org>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://patch.msgid.link/20251023141457.398685-1-n.zhandarovich@fintech.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chewitt pushed a commit that referenced this pull request Dec 15, 2025
…3576

When I added command queue engine (CQE) support for the Rockchip eMMC
controller, I missed that RK3576 has a separate platform data struct.
While things are working fine on RK3588 (I tested the ROCK 5B) and
the suspend issue is fixed on the RK3576 (I tested the Sige5), this
results in stability issues. By also adding the necessary hooks for
the RK3576 platform the following problems can be avoided:

[   15.606895] mmc0: running CQE recovery
[   15.616189] mmc0: running CQE recovery
[...]
[   25.911484] mmc0: running CQE recovery
[   25.926305] mmc0: running CQE recovery
[   25.927468] mmc0: running CQE recovery
[...]
[   26.255719] mmc0: running CQE recovery
[   26.257162] ------------[ cut here ]------------
[   26.257581] mmc0: cqhci: spurious TCN for tag 31
[   26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c
[   26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT
[   26.264561] Hardware name: ArmSoM Sige5 (DT)
[...]
[   26.272748] Call trace:
[   26.272964]  cqhci_irq+0x440/0x68c (P)
[   26.273296]  dwcmshc_cqe_irq_handler+0x54/0x88
[   26.273689]  sdhci_irq+0xbc/0x1200
[   26.273991]  __handle_irq_event_percpu+0x54/0x1d0
[...]

Note that the above problems do not necessarily happen with every boot.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/
Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
chewitt pushed a commit that referenced this pull request Dec 18, 2025
…3576

When I added command queue engine (CQE) support for the Rockchip eMMC
controller, I missed that RK3576 has a separate platform data struct.
While things are working fine on RK3588 (I tested the ROCK 5B) and
the suspend issue is fixed on the RK3576 (I tested the Sige5), this
results in stability issues. By also adding the necessary hooks for
the RK3576 platform the following problems can be avoided:

[   15.606895] mmc0: running CQE recovery
[   15.616189] mmc0: running CQE recovery
[...]
[   25.911484] mmc0: running CQE recovery
[   25.926305] mmc0: running CQE recovery
[   25.927468] mmc0: running CQE recovery
[...]
[   26.255719] mmc0: running CQE recovery
[   26.257162] ------------[ cut here ]------------
[   26.257581] mmc0: cqhci: spurious TCN for tag 31
[   26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c
[   26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT
[   26.264561] Hardware name: ArmSoM Sige5 (DT)
[...]
[   26.272748] Call trace:
[   26.272964]  cqhci_irq+0x440/0x68c (P)
[   26.273296]  dwcmshc_cqe_irq_handler+0x54/0x88
[   26.273689]  sdhci_irq+0xbc/0x1200
[   26.273991]  __handle_irq_event_percpu+0x54/0x1d0
[...]

Note that the above problems do not necessarily happen with every boot.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/
Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit 78b4d64 ]

A timer that expires a vgem fence automatically in 10 seconds is now
released with timer_delete_sync() from fence->ops.release() called on last
dma_fence_put().  In some scenarios, it can run in IRQ context, which is
not safe unless TIMER_IRQSAFE is used.  One potentially risky scenario was
demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while
working on new IGT subtests syncobj_timeline@stress-* as user space
replacements of some problematic test cases of a dma-fence-chain selftest
[1].

[117.004338] ================================
[117.004340] WARNING: inconsistent lock state
[117.004342] 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 Tainted: G S   U
[117.004346] --------------------------------
[117.004347] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[117.004349] swapper/0/0 [HC1[1]:SC1[1]:HE0:SE0] takes:
[117.004352] ffff888138f86aa8 ((&fence->timer)){?.-.}-{0:0}, at: __timer_delete_sync+0x4b/0x190
[117.004361] {HARDIRQ-ON-W} state was registered at:
[117.004363]   lock_acquire+0xc4/0x2e0
[117.004366]   call_timer_fn+0x80/0x2a0
[117.004368]   __run_timers+0x231/0x310
[117.004370]   run_timer_softirq+0x76/0xe0
[117.004372]   handle_softirqs+0xd4/0x4d0
[117.004375]   __irq_exit_rcu+0x13f/0x160
[117.004377]   irq_exit_rcu+0xe/0x20
[117.004379]   sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004382]   asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004385]   cpuidle_enter_state+0x12b/0x8a0
[117.004388]   cpuidle_enter+0x2e/0x50
[117.004393]   call_cpuidle+0x22/0x60
[117.004395]   do_idle+0x1fd/0x260
[117.004398]   cpu_startup_entry+0x29/0x30
[117.004401]   start_secondary+0x12d/0x160
[117.004404]   common_startup_64+0x13e/0x141
[117.004407] irq event stamp: 2282669
[117.004409] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.004414] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.004419] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.004423] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.004426]
other info that might help us debug this:
[117.004429]  Possible unsafe locking scenario:
[117.004432]        CPU0
[117.004433]        ----
[117.004434]   lock((&fence->timer));
[117.004436]   <Interrupt>
[117.004438]     lock((&fence->timer));
[117.004440]
 *** DEADLOCK ***
[117.004443] 1 lock held by swapper/0/0:
[117.004445]  #0: ffffc90000003d50 ((&fence->timer)){?.-.}-{0:0}, at: call_timer_fn+0x7a/0x2a0
[117.004450]
stack backtrace:
[117.004453] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004455] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004455] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004456] Call Trace:
[117.004456]  <IRQ>
[117.004457]  dump_stack_lvl+0x91/0xf0
[117.004460]  dump_stack+0x10/0x20
[117.004461]  print_usage_bug.part.0+0x260/0x360
[117.004463]  mark_lock+0x76e/0x9c0
[117.004465]  ? register_lock_class+0x48/0x4a0
[117.004467]  __lock_acquire+0xbc3/0x2860
[117.004469]  lock_acquire+0xc4/0x2e0
[117.004470]  ? __timer_delete_sync+0x4b/0x190
[117.004472]  ? __timer_delete_sync+0x4b/0x190
[117.004473]  __timer_delete_sync+0x68/0x190
[117.004474]  ? __timer_delete_sync+0x4b/0x190
[117.004475]  timer_delete_sync+0x10/0x20
[117.004476]  vgem_fence_release+0x19/0x30 [vgem]
[117.004478]  dma_fence_release+0xc1/0x3b0
[117.004480]  ? dma_fence_release+0xa1/0x3b0
[117.004481]  dma_fence_chain_release+0xe7/0x130
[117.004483]  dma_fence_release+0xc1/0x3b0
[117.004484]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004485]  dma_fence_chain_irq_work+0x59/0x80
[117.004487]  irq_work_single+0x75/0xa0
[117.004490]  irq_work_run_list+0x33/0x60
[117.004491]  irq_work_run+0x18/0x40
[117.004493]  __sysvec_irq_work+0x35/0x170
[117.004494]  sysvec_irq_work+0x47/0xc0
[117.004496]  asm_sysvec_irq_work+0x1b/0x20
[117.004497] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004499] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004499] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004500] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004501] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004502] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004502] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004502] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004506]  dma_fence_signal+0x49/0xb0
[117.004507]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004508]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004509]  call_timer_fn+0xa1/0x2a0
[117.004512]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004513]  __run_timers+0x231/0x310
[117.004514]  ? tmigr_handle_remote+0x2ac/0x560
[117.004517]  timer_expire_remote+0x46/0x70
[117.004518]  tmigr_handle_remote+0x433/0x560
[117.004520]  ? __run_timers+0x239/0x310
[117.004521]  ? run_timer_softirq+0x21/0xe0
[117.004522]  ? lock_release+0xce/0x2a0
[117.004524]  run_timer_softirq+0xcf/0xe0
[117.004525]  handle_softirqs+0xd4/0x4d0
[117.004526]  __irq_exit_rcu+0x13f/0x160
[117.004527]  irq_exit_rcu+0xe/0x20
[117.004528]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004529]  </IRQ>
[117.004529]  <TASK>
[117.004529]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004530] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004532] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004532] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004533] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004533] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004534] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004534] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004534] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004537]  ? cpuidle_enter_state+0x125/0x8a0
[117.004538]  ? sched_clock_noinstr+0x9/0x10
[117.004540]  cpuidle_enter+0x2e/0x50
[117.004542]  call_cpuidle+0x22/0x60
[117.004542]  do_idle+0x1fd/0x260
[117.004544]  cpu_startup_entry+0x29/0x30
[117.004546]  rest_init+0x104/0x200
[117.004548]  start_kernel+0x93d/0xbd0
[117.004550]  ? load_ucode_intel_bsp+0x2a/0x90
[117.004551]  ? sme_unmap_bootdata+0x14/0x80
[117.004554]  x86_64_start_reservations+0x18/0x30
[117.004555]  x86_64_start_kernel+0xfd/0x150
[117.004556]  ? soft_restart_cpu+0x14/0x14
[117.004558]  common_startup_64+0x13e/0x141
[117.004560]  </TASK>
[117.004565] ------------[ cut here ]------------
[117.004692] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1610 __timer_delete_sync+0x126/0x190
[117.004697] Modules linked in: vgem snd_hda_codec_intelhdmi snd_hda_codec_hdmi i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit hid_sensor_custom hid_sensor_hub hid_generic intel_ishtp_hid hid intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp cmdlinepart ee1004 r8153_ecm spi_nor coretemp cdc_ether mei_pxp mei_hdcp usbnet mtd intel_rapl_msr wmi_bmof kvm_intel snd_hda_intel snd_intel_dspcfg processor_thermal_device_pci kvm snd_hda_codec processor_thermal_device irqbypass processor_thermal_wt_hint polyval_clmulni platform_temperature_control snd_hda_core ghash_clmulni_intel processor_thermal_rfim spi_pxa2xx_platform snd_hwdep aesni_intel processor_thermal_rapl dw_dmac snd_pcm dw_dmac_core intel_rapl_common r8152 rapl mii intel_cstate spi_pxa2xx_core i2c_i801 processor_thermal_wt_req snd_timer i2c_mux mei_me intel_ish_ipc processor_thermal_power_floor e1000e snd i2c_smbus spi_intel_pci processor_thermal_mbox mei soundcore intel_ishtp thunderbolt idma64
[117.004733]  spi_intel int340x_thermal_zone igen6_edac binfmt_misc intel_skl_int3472_tps68470 intel_pmc_core tps68470_regulator video clk_tps68470 pmt_telemetry pmt_discovery nls_iso8859_1 pmt_class intel_pmc_ssram_telemetry intel_skl_int3472_discrete int3400_thermal intel_hid intel_skl_int3472_common acpi_thermal_rel intel_vsec wmi pinctrl_tigerlake acpi_tad sparse_keymap acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4
[117.004782] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004787] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004789] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004793] RIP: 0010:__timer_delete_sync+0x126/0x190
[117.004795] Code: 31 c0 45 31 c9 c3 cc cc cc cc 48 8b 75 d0 45 84 f6 74 63 49 c7 45 18 00 00 00 00 48 89 c7 e8 51 46 39 01 f3 90 e9 66 ff ff ff <0f> 0b e9 5f ff ff ff e8 ee e4 0c 00 49 8d 5d 28 45 31 c9 31 c9 4c
[117.004801] RSP: 0018:ffffc90000003a40 EFLAGS: 00010046
[117.004804] RAX: ffffffff815093fb RBX: ffff888138f86aa8 RCX: 0000000000000000
[117.004807] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004809] RBP: ffffc90000003a70 R08: 0000000000000000 R09: 0000000000000000
[117.004812] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815093fb
[117.004814] R13: ffff888138f86a80 R14: 0000000000000000 R15: 0000000000000000
[117.004817] FS:  0000000000000000(0000) GS:ffff88890b0f7000(0000) knlGS:0000000000000000
[117.004820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[117.004823] CR2: 00005db8131eb7f0 CR3: 0000000003448000 CR4: 0000000000f52ef0
[117.004826] PKRU: 55555554
[117.004827] Call Trace:
[117.004829]  <IRQ>
[117.004831]  timer_delete_sync+0x10/0x20
[117.004833]  vgem_fence_release+0x19/0x30 [vgem]
[117.004836]  dma_fence_release+0xc1/0x3b0
[117.004838]  ? dma_fence_release+0xa1/0x3b0
[117.004841]  dma_fence_chain_release+0xe7/0x130
[117.004844]  dma_fence_release+0xc1/0x3b0
[117.004847]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004850]  dma_fence_chain_irq_work+0x59/0x80
[117.004853]  irq_work_single+0x75/0xa0
[117.004857]  irq_work_run_list+0x33/0x60
[117.004860]  irq_work_run+0x18/0x40
[117.004863]  __sysvec_irq_work+0x35/0x170
[117.004865]  sysvec_irq_work+0x47/0xc0
[117.004868]  asm_sysvec_irq_work+0x1b/0x20
[117.004871] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004874] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004879] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004882] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004887] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004890] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004892] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004897]  dma_fence_signal+0x49/0xb0
[117.004899]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004902]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004904]  call_timer_fn+0xa1/0x2a0
[117.004908]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004910]  __run_timers+0x231/0x310
[117.004913]  ? tmigr_handle_remote+0x2ac/0x560
[117.004917]  timer_expire_remote+0x46/0x70
[117.004919]  tmigr_handle_remote+0x433/0x560
[117.004923]  ? __run_timers+0x239/0x310
[117.004925]  ? run_timer_softirq+0x21/0xe0
[117.004928]  ? lock_release+0xce/0x2a0
[117.004931]  run_timer_softirq+0xcf/0xe0
[117.004933]  handle_softirqs+0xd4/0x4d0
[117.004936]  __irq_exit_rcu+0x13f/0x160
[117.004938]  irq_exit_rcu+0xe/0x20
[117.004940]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004943]  </IRQ>
[117.004944]  <TASK>
[117.004946]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004949] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004953] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004961] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004963] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004966] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004968] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004971] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004974] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004978]  ? cpuidle_enter_state+0x125/0x8a0
[117.004981]  ? sched_clock_noinstr+0x9/0x10
[117.004985]  cpuidle_enter+0x2e/0x50
[117.004989]  call_cpuidle+0x22/0x60
[117.004991]  do_idle+0x1fd/0x260
[117.005001]  cpu_startup_entry+0x29/0x30
[117.005004]  rest_init+0x104/0x200
[117.005008]  start_kernel+0x93d/0xbd0
[117.005011]  ? load_ucode_intel_bsp+0x2a/0x90
[117.005014]  ? sme_unmap_bootdata+0x14/0x80
[117.005017]  x86_64_start_reservations+0x18/0x30
[117.005020]  x86_64_start_kernel+0xfd/0x150
[117.005023]  ? soft_restart_cpu+0x14/0x14
[117.005026]  common_startup_64+0x13e/0x141
[117.005030]  </TASK>
[117.005032] irq event stamp: 2282669
[117.005034] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.005038] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.005043] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.005047] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.005051] ---[ end trace 0000000000000000 ]---

Make the timer IRQ safe.

[1] https://patchwork.freedesktop.org/series/154987/#rev2

Fixes: 4077798 ("drm/vgem: Attach sw fences to exported vGEM dma-buf (ioctl)")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250926152628.2165080-2-janusz.krzysztofik@linux.intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      #6 0x458090 in record__synthesize builtin-record.c:2075
      #7 0x45a85a in __cmd_record builtin-record.c:2888
      #8 0x45deb6 in cmd_record builtin-record.c:4374
      #9 0x4e5e33 in run_builtin perf.c:349
      #10 0x4e60bf in handle_internal_command perf.c:401
      #11 0x4e6215 in run_argv perf.c:448
      #12 0x4e653a in main perf.c:555
      #13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      #14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit 23b2d2f ]

When booting with KASAN enabled the following splat is
encountered during probe of the k1 clock driver:

UBSAN: array-index-out-of-bounds in drivers/clk/spacemit/ccu-k1.c:1044:16
index 0 is out of range for type 'clk_hw *[*]'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc5+ #1 PREEMPT(lazy)
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10spacemit 10/01/2022
Call Trace:
[<ffffffff8002b628>] dump_backtrace+0x28/0x38
[<ffffffff800027d2>] show_stack+0x3a/0x50
[<ffffffff800220c2>] dump_stack_lvl+0x5a/0x80
[<ffffffff80022100>] dump_stack+0x18/0x20
[<ffffffff800164b8>] ubsan_epilogue+0x10/0x48
[<ffffffff8099034e>] __ubsan_handle_out_of_bounds+0xa6/0xa8
[<ffffffff80acbfa6>] k1_ccu_probe+0x37e/0x420
[<ffffffff80b79e6e>] platform_probe+0x56/0x98
[<ffffffff80b76a7e>] really_probe+0x9e/0x350
[<ffffffff80b76db0>] __driver_probe_device+0x80/0x138
[<ffffffff80b76f52>] driver_probe_device+0x3a/0xd0
[<ffffffff80b771c4>] __driver_attach+0xac/0x1b8
[<ffffffff80b742fc>] bus_for_each_dev+0x6c/0xc8
[<ffffffff80b76296>] driver_attach+0x26/0x38
[<ffffffff80b759ae>] bus_add_driver+0x13e/0x268
[<ffffffff80b7836a>] driver_register+0x52/0x100
[<ffffffff80b79a78>] __platform_driver_register+0x28/0x38
[<ffffffff814585da>] k1_ccu_driver_init+0x22/0x38
[<ffffffff80023a8a>] do_one_initcall+0x62/0x2a0
[<ffffffff81401c60>] do_initcalls+0x170/0x1a8
[<ffffffff81401e7a>] kernel_init_freeable+0x16a/0x1e0
[<ffffffff811f7534>] kernel_init+0x2c/0x180
[<ffffffff80025f56>] ret_from_fork_kernel+0x16/0x1d8
[<ffffffff81205336>] ret_from_fork_kernel_asm+0x16/0x18
---[ end trace ]---

This is bogus and is simply a result of KASAN consulting the
`.num` member of the struct for bounds information (as it should
due to `__counted_by`) and finding 0 set by kzalloc() because it
has not been initialized before the loop that fills in the array.
The easy fix is to just move the line that sets `num` to before
the loop that fills the array so that KASAN has the information
it needs to accurately conclude that the access is valid.

Fixes: 1b72c59 ("clk: spacemit: Add clock support for SpacemiT K1 SoC")
Tested-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Reviewed-by: Alex Elder <elder@riscstar.com>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit 385aab8 ]

MT7996 driver can use both wed and wed_hif2 devices to offload traffic
from/to the wireless NIC. In the current codebase we assume to always
use the primary wed device in wed callbacks resulting in the following
crash if the hw runs wed_hif2 (e.g. 6GHz link).

[  297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a
[  297.464928] Mem abort info:
[  297.467722]   ESR = 0x0000000096000005
[  297.471461]   EC = 0x25: DABT (current EL), IL = 32 bits
[  297.476766]   SET = 0, FnV = 0
[  297.479809]   EA = 0, S1PTW = 0
[  297.482940]   FSC = 0x05: level 1 translation fault
[  297.487809] Data abort info:
[  297.490679]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  297.496156]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  297.501196]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000
[  297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000
[  297.523532] Internal error: Oops: 0000000096000005 [#1] SMP
[  297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G           O       6.12.50 #0
[  297.723908] Tainted: [O]=OOT_MODULE
[  297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT)
[  297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table]
[  297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.752688] lr : mtk_wed_flow_remove+0x58/0x80
[  297.757126] sp : ffffffc080fe3ae0
[  297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7
[  297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00
[  297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018
[  297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000
[  297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000
[  297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da
[  297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200
[  297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002
[  297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000
[  297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8
[  297.831686] Call trace:
[  297.834123]  mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.839254]  mtk_wed_flow_remove+0x58/0x80
[  297.843342]  mtk_flow_offload_cmd+0x434/0x574
[  297.847689]  mtk_wed_setup_tc_block_cb+0x30/0x40
[  297.852295]  nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table]
[  297.858466]  nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table]
[  297.864463]  process_one_work+0x174/0x300
[  297.868465]  worker_thread+0x278/0x430
[  297.872204]  kthread+0xd8/0xdc
[  297.875251]  ret_from_fork+0x10/0x20
[  297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421)
[  297.884901] ---[ end trace 0000000000000000 ]---

Fix the issue detecting the proper wed reference to use running wed
callabacks.

Fixes: 83eafc9 ("wifi: mt76: mt7996: add wed tx support")
Tested-by: Daniel Pawlik <pawlik.dan@gmail.com>
Tested-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit 1f73a56 ]

Neither sock4 nor sock6 pointers are guaranteed to be non-NULL in
vxlan_xmit_one, e.g. if the iface is brought down. This can lead to the
following NULL dereference:

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  Oops: Oops: 0000 [#1] SMP NOPTI
  RIP: 0010:vxlan_xmit_one+0xbb3/0x1580
  Call Trace:
   vxlan_xmit+0x429/0x610
   dev_hard_start_xmit+0x55/0xa0
   __dev_queue_xmit+0x6d0/0x7f0
   ip_finish_output2+0x24b/0x590
   ip_output+0x63/0x110

Mentioned commits changed the code path in vxlan_xmit_one and as a side
effect the sock4/6 pointer validity checks in vxlan(6)_get_route were
lost. Fix this by adding back checks.

Since both commits being fixed were released in the same version (v6.7)
and are strongly related, bundle the fixes in a single commit.

Reported-by: Liang Li <liali@redhat.com>
Fixes: 6f19b2c ("vxlan: use generic function for tunnel IPv4 route lookup")
Fixes: 2aceb89 ("vxlan: use generic function for tunnel IPv6 route lookup")
Cc: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251126102627.74223-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
…ne() and smb_direct_cm_handler()

[ Upstream commit 425c327 ]

Namjae reported the following:

I have a simple file copy test with windows 11 client, and get the
following error message.

[  894.140312] ------------[ cut here ]------------
[  894.140316] WARNING: CPU: 1 PID: 116 at
fs/smb/server/transport_rdma.c:642 recv_done+0x308/0x360 [ksmbd]
[  894.140335] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils
libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi
snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic
rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel
ib_iser snd_hda_codec intel_uncore_frequency
intel_uncore_frequency_common snd_hda_core intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi
snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel
i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm
snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm
aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi
asus_wmi snd_timer intel_cstate ttm snd drm_client_lib
drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi
wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper
input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp
nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc
ramoops efi_pstore
[  894.140414]  reed_solomon pstore_blk pstore_zone autofs4 btrfs
blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas
usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core
realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd]
[  894.140442] CPU: 1 UID: 0 PID: 116 Comm: kworker/1:1H Tainted: G
    W           6.18.0-rc5+ #1 PREEMPT(voluntary)
[  894.140447] Tainted: [W]=WARN
[  894.140448] Hardware name: System manufacturer System Product
Name/H110M-K, BIOS 3601 12/12/2017
[  894.140450] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  894.140476] RIP: 0010:recv_done+0x308/0x360 [ksmbd]
[  894.140487] Code: 2e f2 ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc
cc cc cc 41 8b 55 10 49 8b 75 08 b9 02 00 00 00 e8 ed f4 f2 c3 e9 59
fd ff ff <0f> 0b e9 02 ff ff ff 49 8b 74 24 28 49 8d 94 24 c8 00 00 00
bf 00
[  894.140490] RSP: 0018:ffffa47ec03f3d78 EFLAGS: 00010293
[  894.140492] RAX: 0000000000000001 RBX: ffff8eb84c818000 RCX: 000000010002ba00
[  894.140494] RDX: 0000000037600001 RSI: 0000000000000083 RDI: ffff8eb92ec9ee40
[  894.140496] RBP: ffffa47ec03f3da0 R08: 0000000000000000 R09: 0000000000000010
[  894.140498] R10: ffff8eb801705680 R11: fefefefefefefeff R12: ffff8eb7454b8810
[  894.140499] R13: ffff8eb746deb988 R14: ffff8eb746deb980 R15: ffff8eb84c818000
[  894.140501] FS:  0000000000000000(0000) GS:ffff8eb9a7355000(0000)
knlGS:0000000000000000
[  894.140503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.140505] CR2: 00002d9401d60018 CR3: 0000000010a40006 CR4: 00000000003726f0
[  894.140507] Call Trace:
[  894.140509]  <TASK>
[  894.140512]  __ib_process_cq+0x8e/0x190 [ib_core]
[  894.140530]  ib_cq_poll_work+0x2f/0x90 [ib_core]
[  894.140545]  process_scheduled_works+0xd4/0x430
[  894.140554]  worker_thread+0x12a/0x270
[  894.140558]  kthread+0x10d/0x250
[  894.140564]  ? __pfx_worker_thread+0x10/0x10
[  894.140567]  ? __pfx_kthread+0x10/0x10
[  894.140571]  ret_from_fork+0x11a/0x160
[  894.140574]  ? __pfx_kthread+0x10/0x10
[  894.140577]  ret_from_fork_asm+0x1a/0x30
[  894.140584]  </TASK>
[  894.140585] ---[ end trace 0000000000000000 ]---
[  894.154363] ------------[ cut here ]------------
[  894.154367] WARNING: CPU: 3 PID: 5543 at
fs/smb/server/transport_rdma.c:1728 smb_direct_cm_handler+0x121/0x130
[ksmbd]
[  894.154384] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils
libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi
snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic
rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel
ib_iser snd_hda_codec intel_uncore_frequency
intel_uncore_frequency_common snd_hda_core intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi
snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel
i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm
snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm
aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi
asus_wmi snd_timer intel_cstate ttm snd drm_client_lib
drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi
wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper
input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp
nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc
ramoops efi_pstore
[  894.154456]  reed_solomon pstore_blk pstore_zone autofs4 btrfs
blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas
usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core
realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd]
[  894.154483] CPU: 3 UID: 0 PID: 5543 Comm: kworker/3:6 Tainted: G
    W           6.18.0-rc5+ #1 PREEMPT(voluntary)
[  894.154487] Tainted: [W]=WARN
[  894.154488] Hardware name: System manufacturer System Product
Name/H110M-K, BIOS 3601 12/12/2017
[  894.154490] Workqueue: ib_cm cm_work_handler [ib_cm]
[  894.154499] RIP: 0010:smb_direct_cm_handler+0x121/0x130 [ksmbd]
[  894.154507] Code: e7 e8 13 b1 ef ff 44 89 e1 4c 89 ee 48 c7 c7 80
d7 59 c1 48 89 c2 e8 2e 4d ef c3 31 c0 5b 41 5c 41 5d 41 5e 5d c3 cc
cc cc cc <0f> 0b eb a5 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
90 90
[  894.154510] RSP: 0018:ffffa47ec1b27c00 EFLAGS: 00010206
[  894.154512] RAX: ffffffffc1304e00 RBX: ffff8eb89ae50880 RCX: 0000000000000000
[  894.154514] RDX: ffff8eb730960000 RSI: ffffa47ec1b27c60 RDI: ffff8eb7454b9400
[  894.154515] RBP: ffffa47ec1b27c20 R08: 0000000000000002 R09: ffff8eb730b8c18b
[  894.154517] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000009
[  894.154518] R13: ffff8eb7454b9400 R14: ffff8eb7454b8810 R15: ffff8eb815c43000
[  894.154520] FS:  0000000000000000(0000) GS:ffff8eb9a7455000(0000)
knlGS:0000000000000000
[  894.154522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.154523] CR2: 00007fe1310e99d0 CR3: 0000000010a40005 CR4: 00000000003726f0
[  894.154525] Call Trace:
[  894.154527]  <TASK>
[  894.154530]  cma_cm_event_handler+0x27/0xd0 [rdma_cm]
[  894.154541]  cma_ib_handler+0x99/0x2e0 [rdma_cm]
[  894.154551]  cm_process_work+0x28/0xf0 [ib_cm]
[  894.154557]  cm_queue_work_unlock+0x41/0xf0 [ib_cm]
[  894.154563]  cm_work_handler+0x2eb/0x25b0 [ib_cm]
[  894.154568]  ? pwq_activate_first_inactive+0x52/0x70
[  894.154572]  ? pwq_dec_nr_in_flight+0x244/0x330
[  894.154575]  process_scheduled_works+0xd4/0x430
[  894.154579]  worker_thread+0x12a/0x270
[  894.154581]  kthread+0x10d/0x250
[  894.154585]  ? __pfx_worker_thread+0x10/0x10
[  894.154587]  ? __pfx_kthread+0x10/0x10
[  894.154590]  ret_from_fork+0x11a/0x160
[  894.154593]  ? __pfx_kthread+0x10/0x10
[  894.154596]  ret_from_fork_asm+0x1a/0x30
[  894.154602]  </TASK>
[  894.154603] ---[ end trace 0000000000000000 ]---
[  894.154931] ksmbd: smb_direct: disconnected
[  894.157278] ksmbd: smb_direct: disconnected

I guess sc->first_error is already set and sc->status
is thus unexpected, so this should avoid the WARN[_ON]_ONCE()
if sc->first_error is already set and have a usable error path.

While there set sc->first_error as soon as possible.

v1 of this patch revealed the real problem with this message:

[  309.560973] expected[NEGOTIATE_NEEDED] != RDMA_CONNECT_RUNNING
first_error=0 local=192.168.0.200:445 remote=192.168.0.100:60445
[  309.561034] WARNING: CPU: 2 PID: 78 at transport_rdma.c:643
recv_done+0x2fa/0x3d0 [ksmbd]

Some drivers (at least mlx5_ib) might post a recv completion before
RDMA_CM_EVENT_ESTABLISHED, so we need to adjust our expectation in that
case.

Fixes: e2d5e51 ("smb: server: only turn into SMBDIRECT_SOCKET_CONNECTED when negotiation is done")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Paulo Alcantara <pc@manguebit.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit ccb61a3 ]

kbd_led_set() can sleep, and so may not be used as the brightness_set()
callback.

Otherwise using this led with a trigger leads to system hangs
accompanied by:
BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003
CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.17.9-1
Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024
Call Trace:
 <TASK>
 [...]
 schedule_timeout+0xbd/0x100
 __down_common+0x175/0x290
 down_timeout+0x67/0x70
 acpi_os_wait_semaphore+0x57/0x90
 [...]
 asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi]
 led_trigger_event+0x3f/0x60
 [...]

Fixes: 9fe44fc ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Denis Benato <benato.denis96@gmail.com>
Link: https://patch.msgid.link/20251129101307.18085-3-anton@khirnov.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
[ Upstream commit d84e47e ]

Since commit a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
nouveau appears to be broken for all dispnv04 GPUs (before NV50). Depending
on the kernel version, either having no display output and hanging in
kernel for a long time, or even oopsing in the cleanup path like:

Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
...
nouveau 0000:0a:00.0: drm: 0x14C5: Parsing digital output script table
BUG: Unable to handle kernel data access on read at 0x00041520
Faulting instruction address: 0xc0003d0001be0844
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash  SMP NR_CPUS=8 NUMA PowerMac
Modules linked in: windfarm_cpufreq_clamp windfarm_smu_sensors windfarm_smu_controls windfarm_pm112 snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa windfarm_pid jo
 apple_mfi_fastcharge rndis_host cdc_ether usbnet mii snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore rack_meter windfarm_smu_sat windfarm_max6690_s
m75_sensor windfarm_core gpu_sched drm_gpuvm drm_exec drm_client_lib drm_ttm_helper ttm drm_display_helper drm_kms_helper drm drm_panel_orientation_quirks syscopyar
_sys_fops i2c_algo_bit backlight uio_pdrv_genirq uio uninorth_agp agpgart zram dm_mod dax ipv6 nfsv4 dns_resolver nfs lockd grace sunrpc offb cfbfillrect cfbimgblt
ont input_leds sr_mod cdrom sd_mod uas ata_generic hid_apple hid_generic usbhid hid usb_storage pata_macio sata_svw libata firewire_ohci scsi_mod firewire_core ohci
ehci_pci ehci_hcd tg3 ohci_hcd libphy usbcore usb_common nls_base
 led_class
CPU: 0 UID: 0 PID: 245 Comm: (udev-worker) Not tainted 6.14.0-09584-g7d06015d936c #7 PREEMPTLAZY
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP:  c0003d0001be0844 LR: c0003d0001be0830 CTR: 0000000000000000
REGS: c0000000053f70e0 TRAP: 0300   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24222220  XER: 00000000
DAR: 0000000000041520 DSISR: 40000000 IRQMASK: 0 \x0aGPR00: c0003d0001be0830 c0000000053f7380 c0003d0000911900 c000000007bc6800 \x0aGPR04: 0000000000000000 0000000000000000 c000000007bc6e70 0000000000000001 \x0aGPR08: 01f3040000000000 0000000000041520 0000000000000000 c0003d0000813958 \x0aGPR12: c000000000071a48 c000000000e28000 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000001 c0003d0000928528 \x0aGPR24: c0003d0000928598 0000000000000000 c000000007025480 c000000007025480 \x0aGPR28: c0000000010b4000 0000000000000000 c000000007bc1800 c000000007bc6800
NIP [c0003d0001be0844] nv_crtc_destroy+0x44/0xd4 [nouveau]
LR [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau]
Call Trace:
[c0000000053f7380] [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] (unreliable)
[c0000000053f73c0] [c0003d00007f7bf4] drm_mode_config_cleanup+0x27c/0x30c [drm]
[c0000000053f7490] [c0003d0001bdea50] nouveau_display_create+0x1cc/0x550 [nouveau]
[c0000000053f7500] [c0003d0001bcc29c] nouveau_drm_device_init+0x1c8/0x844 [nouveau]
[c0000000053f75e0] [c0003d0001bcc9ec] nouveau_drm_probe+0xd4/0x1e0 [nouveau]
[c0000000053f7670] [c000000000557d24] local_pci_probe+0x50/0xa8
[c0000000053f76f0] [c000000000557fa8] pci_device_probe+0x22c/0x240
[c0000000053f7760] [c0000000005fff3c] really_probe+0x188/0x31c
[c0000000053f77e0] [c000000000600204] __driver_probe_device+0x134/0x13c
[c0000000053f7860] [c0000000006002c0] driver_probe_device+0x3c/0xb4
[c0000000053f78a0] [c000000000600534] __driver_attach+0x118/0x128
[c0000000053f78e0] [c0000000005fe038] bus_for_each_dev+0xa8/0xf4
[c0000000053f7950] [c0000000005ff460] driver_attach+0x2c/0x40
[c0000000053f7970] [c0000000005fea68] bus_add_driver+0x130/0x278
[c0000000053f7a00] [c00000000060117c] driver_register+0x9c/0x1a0
[c0000000053f7a80] [c00000000055623c] __pci_register_driver+0x5c/0x70
[c0000000053f7aa0] [c0003d0001c058a0] nouveau_drm_init+0x254/0x278 [nouveau]
[c0000000053f7b10] [c00000000000e9bc] do_one_initcall+0x84/0x268
[c0000000053f7bf0] [c0000000001a0ba0] do_init_module+0x70/0x2d8
[c0000000053f7c70] [c0000000001a42bc] init_module_from_file+0xb4/0x108
[c0000000053f7d50] [c0000000001a4504] sys_finit_module+0x1ac/0x478
[c0000000053f7e10] [c000000000023230] system_call_exception+0x1a4/0x20c
[c0000000053f7e50] [c00000000000c554] system_call_common+0xf4/0x258
 --- interrupt: c00 at 0xfd5f988
NIP:  000000000fd5f988 LR: 000000000ff9b148 CTR: 0000000000000000
REGS: c0000000053f7e80 TRAP: 0c00   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  100000000000d032 <HV,EE,PR,ME,IR,DR,RI>  CR: 28222244  XER: 00000000
IRQMASK: 0 \x0aGPR00: 0000000000000161 00000000ffcdc2d0 00000000405db160 0000000000000020 \x0aGPR04: 000000000ffa2c9c 0000000000000000 000000000000001f 0000000000000045 \x0aGPR08: 0000000011a13770 0000000000000000 0000000000000000 0000000000000000 \x0aGPR12: 0000000000000000 0000000010249d8c 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000000 0000000011a11a70 \x0aGPR24: 0000000011a13580 0000000011a11950 0000000011a11a70 0000000000020000 \x0aGPR28: 000000000ffa2c9c 0000000000000000 000000000ffafc40 0000000011a11a70
NIP [000000000fd5f988] 0xfd5f988
LR [000000000ff9b148] 0xff9b148
 --- interrupt: c00
Code: f821ffc1 418200ac e93f0000 e9290038 e9291468 eba90000 48026c0d e8410018 e93f06aa 3d290001 392982a4 79291f24 <7fdd482a> 2c3e0000 41820030 7fc3f378
 ---[ end trace 0000000000000000 ]---

This is caused by the i2c encoder modules vendored into nouveau/ now
depending on the equally vendored nouveau_i2c_encoder_destroy
function. Trying to auto-load this modules hangs on nouveau
initialization until timeout, and nouveau continues without i2c video
encoders.

Fix by avoiding nouveau dependency by __always_inlining that helper
functions into those i2c video encoder modules.

Fixes: a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
Signed-off-by: René Rebe <rene@exactco.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Lyude: fixed commit reference in description]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251202.164952.2216481867721531616.rene@exactco.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
chewitt pushed a commit that referenced this pull request Dec 19, 2025
…3576

When I added command queue engine (CQE) support for the Rockchip eMMC
controller, I missed that RK3576 has a separate platform data struct.
While things are working fine on RK3588 (I tested the ROCK 5B) and
the suspend issue is fixed on the RK3576 (I tested the Sige5), this
results in stability issues. By also adding the necessary hooks for
the RK3576 platform the following problems can be avoided:

[   15.606895] mmc0: running CQE recovery
[   15.616189] mmc0: running CQE recovery
[...]
[   25.911484] mmc0: running CQE recovery
[   25.926305] mmc0: running CQE recovery
[   25.927468] mmc0: running CQE recovery
[...]
[   26.255719] mmc0: running CQE recovery
[   26.257162] ------------[ cut here ]------------
[   26.257581] mmc0: cqhci: spurious TCN for tag 31
[   26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c
[   26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT
[   26.264561] Hardware name: ArmSoM Sige5 (DT)
[...]
[   26.272748] Call trace:
[   26.272964]  cqhci_irq+0x440/0x68c (P)
[   26.273296]  dwcmshc_cqe_irq_handler+0x54/0x88
[   26.273689]  sdhci_irq+0xbc/0x1200
[   26.273991]  __handle_irq_event_percpu+0x54/0x1d0
[...]

Note that the above problems do not necessarily happen with every boot.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/
Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
chewitt pushed a commit that referenced this pull request Dec 29, 2025
A race condition was found in sg_proc_debug_helper(). It was observed on
a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
/proc/scsi/sg/debug every second. A very large elapsed time would
sometimes appear. This is caused by two race conditions.

We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
architecture. A patched kernel was built, and the race condition could
not be observed anymore after the application of this patch. A
reproducer C program utilising the scsi_debug module was also built by
Changhui Zhong and can be viewed here:

https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c

The first race happens between the reading of hp->duration in
sg_proc_debug_helper() and request completion in sg_rq_end_io().  The
hp->duration member variable may hold either of two types of
information:

 #1 - The start time of the request. This value is present while
      the request is not yet finished.

 #2 - The total execution time of the request (end_time - start_time).

If sg_proc_debug_helper() executes *after* the value of hp->duration was
changed from #1 to #2, but *before* srp->done is set to 1 in
sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
elapsed time (value type #2) is subtracted from a timestamp, which
cannot yield a valid elapsed time (which is a type #2 value as well).

To fix this issue, the value of hp->duration must change under the
protection of the sfp->rq_list_lock in sg_rq_end_io().  Since
sg_proc_debug_helper() takes this read lock, the change to srp->done and
srp->header.duration will happen atomically from the perspective of
sg_proc_debug_helper() and the race condition is thus eliminated.

The second race condition happens between sg_proc_debug_helper() and
sg_new_write(). Even though hp->duration is set to the current time
stamp in sg_add_request() under the write lock's protection, it gets
overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
to copy struct sg_io_hdr from userspace into kernel space. hp->duration
is set to the start time again in sg_common_write(). If
sg_proc_debug_helper() is called between these two calls, an arbitrary
value set by userspace (usually zero) is used to compute the elapsed
time.

To fix this issue, hp->duration must be set to the current timestamp
again after get_sg_io_hdr() returns successfully. A small race window
still exists between get_sg_io_hdr() and setting hp->duration, but this
window is only a few instructions wide and does not result in observable
issues in practice, as confirmed by testing.

Additionally, we fix the format specifier from %d to %u for printing
unsigned int values in sg_proc_debug_helper().

Signed-off-by: Michal Rábek <mrabek@redhat.com>
Suggested-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
irdma_net_event() should not dereference anything from "neigh" (alias
"ptr") until it has checked that the event is NETEVENT_NEIGH_UPDATE.
Other events come with different structures pointed to by "ptr" and they
may be smaller than struct neighbour.

Move the read of neigh->dev under the NETEVENT_NEIGH_UPDATE case.

The bug is mostly harmless, but it triggers KASAN on debug kernels:

 BUG: KASAN: stack-out-of-bounds in irdma_net_event+0x32e/0x3b0 [irdma]
 Read of size 8 at addr ffffc900075e07f0 by task kworker/27:2/542554

 CPU: 27 PID: 542554 Comm: kworker/27:2 Kdump: loaded Not tainted 5.14.0-630.el9.x86_64+debug #1
 Hardware name: [...]
 Workqueue: events rt6_probe_deferred
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x60/0xb0
  print_address_description.constprop.0+0x2c/0x3f0
  print_report+0xb4/0x270
  kasan_report+0x92/0xc0
  irdma_net_event+0x32e/0x3b0 [irdma]
  notifier_call_chain+0x9e/0x180
  atomic_notifier_call_chain+0x5c/0x110
  rt6_do_redirect+0xb91/0x1080
  tcp_v6_err+0xe9b/0x13e0
  icmpv6_notify+0x2b2/0x630
  ndisc_redirect_rcv+0x328/0x530
  icmpv6_rcv+0xc16/0x1360
  ip6_protocol_deliver_rcu+0xb84/0x12e0
  ip6_input_finish+0x117/0x240
  ip6_input+0xc4/0x370
  ipv6_rcv+0x420/0x7d0
  __netif_receive_skb_one_core+0x118/0x1b0
  process_backlog+0xd1/0x5d0
  __napi_poll.constprop.0+0xa3/0x440
  net_rx_action+0x78a/0xba0
  handle_softirqs+0x2d4/0x9c0
  do_softirq+0xad/0xe0
  </IRQ>

Fixes: 915cc7a ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://patch.msgid.link/r/20251127143150.121099-1-mschmidt@redhat.com
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
In e1000_tbi_should_accept() we read the last byte of the frame via
'data[length - 1]' to evaluate the TBI workaround. If the descriptor-
reported length is zero or larger than the actual RX buffer size, this
read goes out of bounds and can hit unrelated slab objects. The issue
is observed from the NAPI receive path (e1000_clean_rx_irq):

==================================================================
BUG: KASAN: slab-out-of-bounds in e1000_tbi_should_accept+0x610/0x790
Read of size 1 at addr ffff888014114e54 by task sshd/363

CPU: 0 PID: 363 Comm: sshd Not tainted 5.18.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x5a/0x74
 print_address_description+0x7b/0x440
 print_report+0x101/0x200
 kasan_report+0xc1/0xf0
 e1000_tbi_should_accept+0x610/0x790
 e1000_clean_rx_irq+0xa8c/0x1110
 e1000_clean+0xde2/0x3c10
 __napi_poll+0x98/0x380
 net_rx_action+0x491/0xa20
 __do_softirq+0x2c9/0x61d
 do_softirq+0xd1/0x120
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xfe/0x130
 ip_finish_output2+0x7d5/0xb00
 __ip_queue_xmit+0xe24/0x1ab0
 __tcp_transmit_skb+0x1bcb/0x3340
 tcp_write_xmit+0x175d/0x6bd0
 __tcp_push_pending_frames+0x7b/0x280
 tcp_sendmsg_locked+0x2e4f/0x32d0
 tcp_sendmsg+0x24/0x40
 sock_write_iter+0x322/0x430
 vfs_write+0x56c/0xa60
 ksys_write+0xd1/0x190
 do_syscall_64+0x43/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f511b476b10
Code: 73 01 c3 48 8b 0d 88 d3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d f9 2b 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e 9b 01 00 48 89 04 24
RSP: 002b:00007ffc9211d4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000004024 RCX: 00007f511b476b10
RDX: 0000000000004024 RSI: 0000559a9385962c RDI: 0000000000000003
RBP: 0000559a9383a400 R08: fffffffffffffff0 R09: 0000000000004f00
R10: 0000000000000070 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc9211d57f R14: 0000559a9347bde7 R15: 0000000000000003
 </TASK>
Allocated by task 1:
 __kasan_krealloc+0x131/0x1c0
 krealloc+0x90/0xc0
 add_sysfs_param+0xcb/0x8a0
 kernel_add_sysfs_param+0x81/0xd4
 param_sysfs_builtin+0x138/0x1a6
 param_sysfs_init+0x57/0x5b
 do_one_initcall+0x104/0x250
 do_initcall_level+0x102/0x132
 do_initcalls+0x46/0x74
 kernel_init_freeable+0x28f/0x393
 kernel_init+0x14/0x1a0
 ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff888014114000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1620 bytes to the right of
 2048-byte region [ffff888014114000, ffff888014114800]
The buggy address belongs to the physical page:
page:ffffea0000504400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14110
head:ffffea0000504400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head|node=0|zone=1)
raw: 0100000000010200 0000000000000000 dead000000000001 ffff888013442000
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
==================================================================

This happens because the TBI check unconditionally dereferences the last
byte without validating the reported length first:

	u8 last_byte = *(data + length - 1);

Fix by rejecting the frame early if the length is zero, or if it exceeds
adapter->rx_buffer_len. This preserves the TBI workaround semantics for
valid frames and prevents touching memory beyond the RX buffer.

Fixes: 2037110 ("e1000: move tbi workaround code into helper function")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
A null pointer dereference in handshake_complete() was observed [1].

When handshake_req_next() return NULL in handshake_nl_accept_doit(),
function handshake_complete() will be called unexpectedly which triggers
this crash. Fix it by goto out_status when req is NULL.

[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
RIP: 0010:handshake_complete+0x36/0x2b0 net/handshake/request.c:288
Call Trace:
 <TASK>
 handshake_nl_accept_doit+0x32d/0x7e0 net/handshake/netlink.c:129
 genl_family_rcv_msg_doit+0x204/0x300 net/netlink/genetlink.c:1115
 genl_family_rcv_msg+0x436/0x670 net/netlink/genetlink.c:1195
 genl_rcv_msg+0xcc/0x170 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x14c/0x430 net/netlink/af_netlink.c:2550
 genl_rcv+0x2d/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x878/0xb20 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x897/0xd70 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 ____sys_sendmsg+0xa39/0xbf0 net/socket.c:2592
 ___sys_sendmsg+0x121/0x1c0 net/socket.c:2646
 __sys_sendmsg+0x155/0x200 net/socket.c:2678
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x5f/0x350 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>

Fixes: fe67b06 ("net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kernel-tls-handshake/aScekpuOYHRM9uOd@morisot.1015granger.net/T/#m7cfa5c11efc626d77622b2981591197a2acdd65e
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251212012723.4111831-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
…nged()

There has been a syzkaller bug reported recently with the following
trace:

list_del corruption, ffff888058bea080->prev is LIST_POISON2 (dead000000000122)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:59!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 21246 Comm: syz.0.2928 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__list_del_entry_valid_or_report+0x13e/0x200 lib/list_debug.c:59
Code: 48 c7 c7 e0 71 f0 8b e8 30 08 ef fc 90 0f 0b 48 89 ef e8 a5 02 55 fd 48 89 ea 48 89 de 48 c7 c7 40 72 f0 8b e8 13 08 ef fc 90 <0f> 0b 48 89 ef e8 88 02 55 fd 48 89 ea 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000d49f370 EFLAGS: 00010286
RAX: 000000000000004e RBX: ffff888058bea080 RCX: ffffc9002817d000
RDX: 0000000000000000 RSI: ffffffff819becc6 RDI: 0000000000000005
RBP: dead000000000122 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: ffff888039e9c230
R13: ffff888058bea088 R14: ffff888058bea080 R15: ffff888055461480
FS:  00007fbbcfe6f6c0(0000) GS:ffff8880d6d0a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c3afcb0 CR3: 00000000382c7000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:132 [inline]
 __list_del_entry include/linux/list.h:223 [inline]
 list_del_rcu include/linux/rculist.h:178 [inline]
 __team_queue_override_port_del drivers/net/team/team_core.c:826 [inline]
 __team_queue_override_port_del drivers/net/team/team_core.c:821 [inline]
 team_queue_override_port_prio_changed drivers/net/team/team_core.c:883 [inline]
 team_priority_option_set+0x171/0x2f0 drivers/net/team/team_core.c:1534
 team_option_set drivers/net/team/team_core.c:376 [inline]
 team_nl_options_set_doit+0x8ae/0xe60 drivers/net/team/team_core.c:2653
 genl_family_rcv_msg_doit+0x209/0x2f0 net/netlink/genetlink.c:1115
 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
 genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
 netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1896
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 ____sys_sendmsg+0xa98/0xc70 net/socket.c:2630
 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2684
 __sys_sendmsg+0x16d/0x220 net/socket.c:2716
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The problem is in this flow:
1) Port is enabled, queue_id != 0, in qom_list
2) Port gets disabled
        -> team_port_disable()
        -> team_queue_override_port_del()
        -> del (removed from list)
3) Port is disabled, queue_id != 0, not in any list
4) Priority changes
        -> team_queue_override_port_prio_changed()
        -> checks: port disabled && queue_id != 0
        -> calls del - hits the BUG as it is removed already

To fix this, change the check in team_queue_override_port_prio_changed()
so it returns early if port is not enabled.

Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
Fixes: 6c31ff3 ("team: remove synchronize_rcu() called during queue override change")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251212102953.167287-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
After the blamed commit below, if the MPC subflow is already in TCP_CLOSE
status or has fallback to TCP at mptcp_disconnect() time,
mptcp_do_fastclose() skips setting the `send_fastclose flag` and the later
__mptcp_close_ssk() does not reset anymore the related subflow context.

Any later connection will be created with both the `request_mptcp` flag
and the msk-level fallback status off (it is unconditionally cleared at
MPTCP disconnect time), leading to a warning in subflow_data_ready():

  WARNING: CPU: 26 PID: 8996 at net/mptcp/subflow.c:1519 subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Modules linked in:
  CPU: 26 UID: 0 PID: 8996 Comm: syz.22.39 Not tainted 6.18.0-rc7-05427-g11fc074f6c36 #1 PREEMPT(voluntary)
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: 0010:subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Code: 90 0f 0b 90 90 e9 04 fe ff ff e8 b7 1e f5 fe 89 ee bf 07 00 00 00 e8 db 19 f5 fe 83 fd 07 0f 84 35 ff ff ff e8 9d 1e f5 fe 90 <0f> 0b 90 e9 27 ff ff ff e8 8f 1e f5 fe 4c 89 e7 48 89 de e8 14 09
  RSP: 0018:ffffc9002646fb30 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88813b218000 RCX: ffffffff825c8435
  RDX: ffff8881300b3580 RSI: ffffffff825c8443 RDI: 0000000000000005
  RBP: 000000000000000b R08: ffffffff825c8435 R09: 000000000000000b
  R10: 0000000000000005 R11: 0000000000000007 R12: ffff888131ac0000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f88330af6c0(0000) GS:ffff888a93dd2000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f88330aefe8 CR3: 000000010ff59000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   tcp_data_ready (net/ipv4/tcp_input.c:5356)
   tcp_data_queue (net/ipv4/tcp_input.c:5445)
   tcp_rcv_state_process (net/ipv4/tcp_input.c:7165)
   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1955)
   __release_sock (include/net/sock.h:1158 (discriminator 6) net/core/sock.c:3180 (discriminator 6))
   release_sock (net/core/sock.c:3737)
   mptcp_sendmsg (net/mptcp/protocol.c:1763 net/mptcp/protocol.c:1857)
   inet_sendmsg (net/ipv4/af_inet.c:853 (discriminator 7))
   __sys_sendto (net/socket.c:727 (discriminator 15) net/socket.c:742 (discriminator 15) net/socket.c:2244 (discriminator 15))
   __x64_sys_sendto (net/socket.c:2247)
   do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  RIP: 0033:0x7f883326702d

Address the issue setting an explicit `fastclosing` flag at fastclose
time, and checking such flag after mptcp_do_fastclose().

Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-2-d1f9fd1c36c8@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
When a page is freed it coalesces with a buddy into a higher order page
while possible.  When the buddy page migrate type differs, it is expected
to be updated to match the one of the page being freed.

However, only the first pageblock of the buddy page is updated, while the
rest of the pageblocks are left unchanged.

That causes warnings in later expand() and other code paths (like below),
since an inconsistency between migration type of the list containing the
page and the page-owned pageblocks migration types is introduced.

[  308.986589] ------------[ cut here ]------------
[  308.987227] page type is 0, passed migratetype is 1 (nr=256)
[  308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270
[  308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[  308.987439] Unloaded tainted modules: hmac_s390(E):2
[  308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G            E       6.18.0-gcc-bpf-debug torvalds#431 PREEMPT
[  308.987657] Tainted: [E]=UNSIGNED_MODULE
[  308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[  308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270)
[  308.987676]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[  308.987688]            0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300
[  308.987692]            0000000000000008 0000034998d57290 000002be00000100 0000023e00000008
[  308.987696]            0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0
[  308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2	larl	%r2,000003499883abd4
                          00000349976fa5f6: c0e5ffe3f4b5	brasl	%r14,0000034997378f60
                         #00000349976fa5fc: af000000		mc	0,0
                         >00000349976fa600: a7f4ff4c		brc	15,00000349976fa498
                          00000349976fa604: b9040026		lgr	%r2,%r6
                          00000349976fa608: c0300088317f	larl	%r3,0000034998800906
                          00000349976fa60e: c0e5fffdb6e1	brasl	%r14,00000349976b13d0
                          00000349976fa614: af000000		mc	0,0
[  308.987734] Call Trace:
[  308.987738]  [<00000349976fa600>] expand+0x240/0x270
[  308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270)
[  308.987749]  [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940
[  308.987754]  [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[  308.987759]  [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[  308.987763]  [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[  308.987768]  [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[  308.987774]  [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[  308.987781]  [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[  308.987786]  [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[  308.987791]  [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[  308.987799]  [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[  308.987804]  [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[  308.987809]  [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[  308.987813]  [<000003499734d70e>] do_exception+0x1de/0x540
[  308.987822]  [<0000034998387390>] __do_pgm_check+0x130/0x220
[  308.987830]  [<000003499839a934>] pgm_check_handler+0x114/0x160
[  308.987838] 3 locks held by mempig_verify/5224:
[  308.987842]  #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[  308.987859]  #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[  308.987871]  #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[  308.987886] Last Breaking-Event-Address:
[  308.987890]  [<0000034997379096>] __warn_printk+0x136/0x140
[  308.987897] irq event stamp: 52330356
[  308.987901] hardirqs last  enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[  308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[  308.987913] softirqs last  enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[  308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[  308.987929] ---[ end trace 0000000000000000 ]---
[  308.987936] ------------[ cut here ]------------
[  308.987940] page type is 0, passed migratetype is 1 (nr=256)
[  308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0
[  308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[  308.988070] Unloaded tainted modules: hmac_s390(E):2
[  308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G        W   E       6.18.0-gcc-bpf-debug torvalds#431 PREEMPT
[  308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[  308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0)
[  308.988118]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[  308.988133]            0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300
[  308.988139]            0000000000000001 0000000000000008 000002be00000100 000002be803ac000
[  308.988144]            0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728
[  308.988153] Krnl Code: 00000349976f9e22: c020008a06d9	larl	%r2,000003499883abd4
                          00000349976f9e28: c0e5ffe3f89c	brasl	%r14,0000034997378f60
                         #00000349976f9e2e: af000000		mc	0,0
                         >00000349976f9e32: a7f4ff4e		brc	15,00000349976f9cce
                          00000349976f9e36: b904002b		lgr	%r2,%r11
                          00000349976f9e3a: c030008a06e7	larl	%r3,000003499883ac08
                          00000349976f9e40: c0e5fffdbac8	brasl	%r14,00000349976b13d0
                          00000349976f9e46: af000000		mc	0,0
[  308.988184] Call Trace:
[  308.988188]  [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0
[  308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0)
[  308.988202]  [<00000349976ff946>] rmqueue_bulk+0x706/0x940
[  308.988208]  [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[  308.988214]  [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[  308.988221]  [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[  308.988227]  [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[  308.988233]  [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[  308.988240]  [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[  308.988247]  [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[  308.988253]  [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[  308.988260]  [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[  308.988267]  [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[  308.988273]  [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[  308.988279]  [<000003499734d70e>] do_exception+0x1de/0x540
[  308.988286]  [<0000034998387390>] __do_pgm_check+0x130/0x220
[  308.988293]  [<000003499839a934>] pgm_check_handler+0x114/0x160
[  308.988300] 3 locks held by mempig_verify/5224:
[  308.988305]  #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[  308.988322]  #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[  308.988334]  #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[  308.988346] Last Breaking-Event-Address:
[  308.988350]  [<0000034997379096>] __warn_printk+0x136/0x140
[  308.988356] irq event stamp: 52330356
[  308.988360] hardirqs last  enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[  308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[  308.988373] softirqs last  enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[  308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[  308.988388] ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com
Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com
Fixes: e6cf9e1 ("mm: page_alloc: fix up block types when merging compatible blocks")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-mm/87wmalyktd.fsf@linux.ibm.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc->in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  <TASK>
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  </TASK>

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf6603 ("net: mana: Handle hardware recovery events when probing the device")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
There is a crash issue when running zero copy XDP_TX action, the crash
log is shown below.

[  216.122464] Unable to handle kernel paging request at virtual address fffeffff80000000
[  216.187524] Internal error: Oops: 0000000096000144 [#1]  SMP
[  216.301694] Call trace:
[  216.304130]  dcache_clean_poc+0x20/0x38 (P)
[  216.308308]  __dma_sync_single_for_device+0x1bc/0x1e0
[  216.313351]  stmmac_xdp_xmit_xdpf+0x354/0x400
[  216.317701]  __stmmac_xdp_run_prog+0x164/0x368
[  216.322139]  stmmac_napi_poll_rxtx+0xba8/0xf00
[  216.326576]  __napi_poll+0x40/0x218
[  216.408054] Kernel panic - not syncing: Oops: Fatal exception in interrupt

For XDP_TX action, the xdp_buff is converted to xdp_frame by
xdp_convert_buff_to_frame(). The memory type of the resulting xdp_frame
depends on the memory type of the xdp_buff. For page pool based xdp_buff
it produces xdp_frame with memory type MEM_TYPE_PAGE_POOL. For zero copy
XSK pool based xdp_buff it produces xdp_frame with memory type
MEM_TYPE_PAGE_ORDER0. However, stmmac_xdp_xmit_back() does not check the
memory type and always uses the page pool type, this leads to invalid
mappings and causes the crash. Therefore, check the xdp_buff memory type
in stmmac_xdp_xmit_back() to fix this issue.

Fixes: bba2556 ("net: stmmac: Enable RX via AF_XDP zero-copy")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20251204071332.1907111-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
syzbot reported a crash [1] in dql_completed() after recent usbnet
BQL adoption.

The reason for the crash is that netdev_reset_queue() is called too soon.

It should be called after cancel_work_sync(&dev->bh_work) to make
sure no more TX completion can happen.

[1]
kernel BUG at lib/dynamic_queue_limits.c:99 !
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 5197 Comm: udevd Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
 RIP: 0010:dql_completed+0xbe1/0xbf0 lib/dynamic_queue_limits.c:99
Call Trace:
 <IRQ>
  netdev_tx_completed_queue include/linux/netdevice.h:3864 [inline]
  netdev_completed_queue include/linux/netdevice.h:3894 [inline]
  usbnet_bh+0x793/0x1020 drivers/net/usb/usbnet.c:1601
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  bh_worker+0x2b1/0x600 kernel/workqueue.c:3611
  tasklet_action+0xc/0x70 kernel/softirq.c:952
  handle_softirqs+0x27d/0x850 kernel/softirq.c:622
  __do_softirq kernel/softirq.c:656 [inline]
  invoke_softirq kernel/softirq.c:496 [inline]
  __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:739

Fixes: 7ff14c5 ("usbnet: Add support for Byte Queue Limits (BQL)")
Reported-by: syzbot+5b55e49f8bbd84631a9c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6945644f.a70a0220.207337.0113.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Schippers <simon.schippers@tu-dortmund.de>
Link: https://patch.msgid.link/20251219144459.692715-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
… to macb_open()

In the non-RT kernel, local_bh_disable() merely disables preemption,
whereas it maps to an actual spin lock in the RT kernel. Consequently,
when attempting to refill RX buffers via netdev_alloc_skb() in
macb_mac_link_up(), a deadlock scenario arises as follows:

   WARNING: possible circular locking dependency detected
   6.18.0-08691-g2061f18ad76e torvalds#39 Not tainted
   ------------------------------------------------------
   kworker/0:0/8 is trying to acquire lock:
   ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c

   but task is already holding lock:
   ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
   +0x148/0xb7c

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x148/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #2 (_xmit_ETHER#2){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          sch_direct_xmit+0x11c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
          lock_release+0x250/0x348
          __local_bh_enable_ip+0x7c/0x240
          __netdev_alloc_skb+0x1b4/0x1d8
          gem_rx_refill+0xdc/0x240
          gem_init_rings+0xb4/0x108
          macb_mac_link_up+0x9c/0x2b4
          phylink_resolve+0x170/0x614
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #0 (&bp->lock){+.+.}-{3:3}:
          __lock_acquire+0x15a8/0x2084
          lock_acquire+0x1cc/0x350
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x808/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&queue->tx_ptr_lock);
                                  lock(_xmit_ETHER#2);
                                  lock(&queue->tx_ptr_lock);
     lock(&bp->lock);

    *** DEADLOCK ***

   Call trace:
    show_stack+0x18/0x24 (C)
    dump_stack_lvl+0xa0/0xf0
    dump_stack+0x18/0x24
    print_circular_bug+0x28c/0x370
    check_noncircular+0x198/0x1ac
    __lock_acquire+0x15a8/0x2084
    lock_acquire+0x1cc/0x350
    rt_spin_lock+0x50/0x1f0
    macb_start_xmit+0x808/0xb7c
    dev_hard_start_xmit+0x94/0x284
    sch_direct_xmit+0x8c/0x37c
    __dev_queue_xmit+0x708/0x1120
    neigh_resolve_output+0x148/0x28c
    ip6_finish_output2+0x2c0/0xb2c
    __ip6_finish_output+0x114/0x308
    ip6_output+0xc4/0x4a4
    mld_sendpack+0x220/0x68c
    mld_ifc_work+0x2a8/0x4f4
    process_one_work+0x20c/0x5f8
    worker_thread+0x1b0/0x35c
    kthread+0x144/0x200
    ret_from_fork+0x10/0x20

Notably, invoking the mog_init_rings() callback upon link establishment
is unnecessary. Instead, we can exclusively call mog_init_rings() within
the ndo_open() callback. This adjustment resolves the deadlock issue.
Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
when opening the network interface via at91ether_open(), moving
mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
check.

Fixes: 633e98a ("net: macb: use resolved link config in mac_link_up()")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
This reverts commit f804a58.

This change introduced the following panic, and mt792x_load_firmware()
fails. wifi is dead on systems with mt792x wireless.

kern  :crit  : kernel BUG at lib/string_helpers.c:1043!
kern  :warn  : Oops: invalid opcode: 0000 [#1] SMP NOPTI
kern  :warn  : CPU: 14 UID: 0 PID: 61 Comm: kworker/14:0 Tainted: G        W
        6.19.0-rc1 #1 PREEMPT(voluntary)
kern  :warn  : Tainted: [W]=WARN
kern  :warn  : Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.16 07/25/2025
kern  :warn  : Workqueue: events mt7921_init_work [mt7921_common]
kern  :warn  : RIP: 0010:__fortify_panic+0xd/0xf
kern  :warn  : Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 40 0f b6 ff e8 c3 55 71 00 <0f> 0b 48 8b 54 24 10 48 8b 74 24 08 4c 89 e9 48 c7 c7 00 a2 d5 a0
kern  :warn  : RSP: 0018:ffffa7a5c03a3d10 EFLAGS: 00010246
kern  :warn  : RAX: ffffffffa0d7aaf2 RBX: 0000000000000000 RCX: ffffffffa0d7aaf2
kern  :warn  : RDX: 0000000000000011 RSI: ffffffffa0d5a170 RDI: ffffffffa128db10
kern  :warn  : RBP: ffff91650ae52060 R08: 0000000000000010 R09: ffffa7a5c31b2000
kern  :warn  : R10: ffffa7a5c03a3bf0 R11: 00000000ffffffff R12: 0000000000000000
kern  :warn  : R13: ffffa7a5c31b2000 R14: 0000000000001000 R15: 0000000000000000
kern  :warn  : FS:  0000000000000000(0000) GS:ffff91743e664000(0000) knlGS:0000000000000000
kern  :warn  : CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : CR2: 00007f10786c241c CR3: 00000003eca24000 CR4: 0000000000f50ef0
kern  :warn  : PKRU: 55555554
kern  :warn  : Call Trace:
kern  :warn  :  <TASK>
kern  :warn  :  mt76_connac2_load_patch.cold+0x2b/0xa41 [mt76_connac_lib]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  mt792x_load_firmware+0x36/0x150 [mt792x_lib]
kern  :warn  :  mt7921_run_firmware+0x2c/0x4a0 [mt7921_common]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  ? mt7921_rr+0x12/0x30 [mt7921e]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  ? ____mt76_poll_msec+0x75/0xb0 [mt76]
kern  :warn  :  mt7921e_mcu_init+0x4c/0x7a [mt7921e]
kern  :warn  :  mt7921_init_work+0x51/0x190 [mt7921_common]
kern  :warn  :  process_one_work+0x18b/0x340
kern  :warn  :  worker_thread+0x256/0x3a0
kern  :warn  :  ? __pfx_worker_thread+0x10/0x10
kern  :warn  :  kthread+0xfc/0x240
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ret_from_fork+0x254/0x290
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ret_from_fork_asm+0x1a/0x30
kern  :warn  :  </TASK>

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
chewitt pushed a commit that referenced this pull request Jan 5, 2026
ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179

but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
       vfs_write+0x27e/0xb30 fs/read_write.c:684
       ksys_write+0x145/0x250 fs/read_write.c:738
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
       percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
       __sb_start_write include/linux/fs/super.h:19 [inline]
       sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
       mnt_want_write+0x41/0x90 fs/namespace.c:499
       open_last_lookups fs/namei.c:4529 [inline]
       path_openat+0xadd/0x3dd0 fs/namei.c:4784
       do_filp_open+0x1fa/0x410 fs/namei.c:4814
       io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
       __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
       io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
       io_queue_sqe io_uring/io_uring.c:2042 [inline]
       io_submit_sqe io_uring/io_uring.c:2320 [inline]
       io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
       __do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
       __se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
       lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
       io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
       io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
       io_uring_task_cancel include/linux/io_uring.h:24 [inline]
       begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
       load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
       search_binary_handler fs/exec.c:1669 [inline]
       exec_binprm fs/exec.c:1701 [inline]
       bprm_execve+0x92e/0x1400 fs/exec.c:1753
       do_execveat_common+0x510/0x6a0 fs/exec.c:1859
       do_execve fs/exec.c:1933 [inline]
       __do_sys_execve fs/exec.c:2009 [inline]
       __se_sys_execve fs/exec.c:2004 [inline]
       __x64_sys_execve+0x94/0xb0 fs/exec.c:2004
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  &ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sig->cred_guard_mutex);
                               lock(sb_writers#3);
                               lock(&sig->cred_guard_mutex);
  lock(&ctx->uring_lock);

 *** DEADLOCK ***

1 lock held by syz.0.9999/12287:
 #0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
 #0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
 check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
 lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
 __mutex_lock_common kernel/locking/mutex.c:614 [inline]
 __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
 io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
 io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
 io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
 io_uring_task_cancel include/linux/io_uring.h:24 [inline]
 begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
 load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
 search_binary_handler fs/exec.c:1669 [inline]
 exec_binprm fs/exec.c:1701 [inline]
 bprm_execve+0x92e/0x1400 fs/exec.c:1753
 do_execveat_common+0x510/0x6a0 fs/exec.c:1859
 do_execve fs/exec.c:1933 [inline]
 __do_sys_execve fs/exec.c:2009 [inline]
 __se_sys_execve fs/exec.c:2004 [inline]
 __x64_sys_execve+0x94/0xb0 fs/exec.c:2004
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
 </TASK>

Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.

Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants