Skip to content

Conversation

@U2FsdGVkX1
Copy link

Adapt to the kernel API change where dw_pcie_host_ops.host_init was renamed to init in commit torvalds@aea370b

Adapt to the kernel API change where dw_pcie_host_ops.host_init
was renamed to init in commit torvalds@aea370b

Signed-off-by: U2FsdGVkX1 <U2FsdGVkX1@gmail.com>
@RevySR RevySR merged commit c91ffa8 into RevySR:rv/6.16/dp1000/dev Sep 30, 2025
RevySR pushed a commit that referenced this pull request Oct 12, 2025
commit 3e31a6b upstream.

There is a bug observed when rtw89_core_tx_kick_off_and_wait() tries to
access already freed skb_data:

 BUG: KFENCE: use-after-free write in rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110

 CPU: 6 UID: 0 PID: 41377 Comm: kworker/u64:24 Not tainted  6.17.0-rc1+ #1 PREEMPT(lazy)
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
 Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]

 Use-after-free write at 0x0000000020309d9d (in kfence-torvalds#251):
 rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110
 rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
 rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
 rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
 rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.h:141
 rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
 rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
 rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
 process_one_work kernel/workqueue.c:3241
 worker_thread kernel/workqueue.c:3400
 kthread kernel/kthread.c:463
 ret_from_fork arch/x86/kernel/process.c:154
 ret_from_fork_asm arch/x86/entry/entry_64.S:258

 kfence-torvalds#251: 0x0000000056e2393d-0x000000009943cb62, size=232, cache=skbuff_head_cache

 allocated by task 41377 on cpu 6 at 77869.159548s (0.009551s ago):
 __alloc_skb net/core/skbuff.c:659
 __netdev_alloc_skb net/core/skbuff.c:734
 ieee80211_nullfunc_get net/mac80211/tx.c:5844
 rtw89_core_send_nullfunc drivers/net/wireless/realtek/rtw89/core.c:3431
 rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
 rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
 rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
 rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.c:3194
 rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
 rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
 rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
 process_one_work kernel/workqueue.c:3241
 worker_thread kernel/workqueue.c:3400
 kthread kernel/kthread.c:463
 ret_from_fork arch/x86/kernel/process.c:154
 ret_from_fork_asm arch/x86/entry/entry_64.S:258

 freed by task 1045 on cpu 9 at 77869.168393s (0.001557s ago):
 ieee80211_tx_status_skb net/mac80211/status.c:1117
 rtw89_pci_release_txwd_skb drivers/net/wireless/realtek/rtw89/pci.c:564
 rtw89_pci_release_tx_skbs.isra.0 drivers/net/wireless/realtek/rtw89/pci.c:651
 rtw89_pci_release_tx drivers/net/wireless/realtek/rtw89/pci.c:676
 rtw89_pci_napi_poll drivers/net/wireless/realtek/rtw89/pci.c:4238
 __napi_poll net/core/dev.c:7495
 net_rx_action net/core/dev.c:7557 net/core/dev.c:7684
 handle_softirqs kernel/softirq.c:580
 do_softirq.part.0 kernel/softirq.c:480
 __local_bh_enable_ip kernel/softirq.c:407
 rtw89_pci_interrupt_threadfn drivers/net/wireless/realtek/rtw89/pci.c:927
 irq_thread_fn kernel/irq/manage.c:1133
 irq_thread kernel/irq/manage.c:1257
 kthread kernel/kthread.c:463
 ret_from_fork arch/x86/kernel/process.c:154
 ret_from_fork_asm arch/x86/entry/entry_64.S:258

It is a consequence of a race between the waiting and the signaling side
of the completion:

            Waiting thread                            Completing thread

rtw89_core_tx_kick_off_and_wait()
  rcu_assign_pointer(skb_data->wait, wait)
  /* start waiting */
  wait_for_completion_timeout()
                                                rtw89_pci_tx_status()
                                                  rtw89_core_tx_wait_complete()
                                                    rcu_read_lock()
                                                    /* signals completion and
                                                     * proceeds further
                                                     */
                                                    complete(&wait->completion)
                                                    rcu_read_unlock()
                                                  ...
                                                  /* frees skb_data */
                                                  ieee80211_tx_status_ni()
  /* returns (exit status doesn't matter) */
  wait_for_completion_timeout()
  ...
  /* accesses the already freed skb_data */
  rcu_assign_pointer(skb_data->wait, NULL)

The completing side might proceed and free the underlying skb even before
the waiting side is fully awoken and run to execution.  Actually the race
happens regardless of wait_for_completion_timeout() exit status, e.g.
the waiting side may hit a timeout and the concurrent completing side is
still able to free the skb.

Skbs which are sent by rtw89_core_tx_kick_off_and_wait() are owned by the
driver.  They don't come from core ieee80211 stack so no need to pass them
to ieee80211_tx_status_ni() on completing side.

Introduce a work function which will act as a garbage collector for
rtw89_tx_wait_info objects and the associated skbs.  Thus no potentially
heavy locks are required on the completing side.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 1ae5ca6 ("wifi: rtw89: add function to wait for completion of TX skbs")
Cc: stable@vger.kernel.org
Suggested-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250919210852.823912-2-pchelkin@ispras.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 13, 2025
commit 674b56a upstream.

Syzkaller reports a KASAN issue as below:

general protection fault, probably for non-canonical address 0xfbd59c0000000021: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000108-0xdead00000000010f]
CPU: 0 PID: 5083 Comm: syz-executor.2 Not tainted 6.1.134-syzkaller-00037-g855bd1d7d838 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__list_del include/linux/list.h:114 [inline]
RIP: 0010:__list_del_entry include/linux/list.h:137 [inline]
RIP: 0010:list_del include/linux/list.h:148 [inline]
RIP: 0010:p9_fd_cancelled+0xe9/0x200 net/9p/trans_fd.c:734

Call Trace:
 <TASK>
 p9_client_flush+0x351/0x440 net/9p/client.c:614
 p9_client_rpc+0xb6b/0xc70 net/9p/client.c:734
 p9_client_version net/9p/client.c:920 [inline]
 p9_client_create+0xb51/0x1240 net/9p/client.c:1027
 v9fs_session_init+0x1f0/0x18f0 fs/9p/v9fs.c:408
 v9fs_mount+0xba/0xcb0 fs/9p/vfs_super.c:126
 legacy_get_tree+0x108/0x220 fs/fs_context.c:632
 vfs_get_tree+0x8e/0x300 fs/super.c:1573
 do_new_mount fs/namespace.c:3056 [inline]
 path_mount+0x6a6/0x1e90 fs/namespace.c:3386
 do_mount fs/namespace.c:3399 [inline]
 __do_sys_mount fs/namespace.c:3607 [inline]
 __se_sys_mount fs/namespace.c:3584 [inline]
 __x64_sys_mount+0x283/0x300 fs/namespace.c:3584
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

This happens because of a race condition between:

- The 9p client sending an invalid flush request and later cleaning it up;
- The 9p client in p9_read_work() canceled all pending requests.

      Thread 1                              Thread 2
    ...
    p9_client_create()
    ...
    p9_fd_create()
    ...
    p9_conn_create()
    ...
    // start Thread 2
    INIT_WORK(&m->rq, p9_read_work);
                                        p9_read_work()
    ...
    p9_client_rpc()
    ...
                                        ...
                                        p9_conn_cancel()
                                        ...
                                        spin_lock(&m->req_lock);
    ...
    p9_fd_cancelled()
    ...
                                        ...
                                        spin_unlock(&m->req_lock);
                                        // status rewrite
                                        p9_client_cb(m->client, req, REQ_STATUS_ERROR)
                                        // first remove
                                        list_del(&req->req_list);
                                        ...

    spin_lock(&m->req_lock)
    ...
    // second remove
    list_del(&req->req_list);
    spin_unlock(&m->req_lock)
  ...

Commit 74d6a5d ("9p/trans_fd: Fix concurrency del of req_list in
p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem
client where the req_list could be deleted simultaneously by both
p9_read_work and p9_fd_cancelled functions, but for the case where req->status
equals REQ_STATUS_RCVD.

Update the check for req->status in p9_fd_cancelled to skip processing not
just received requests, but anything that is not SENT, as whatever
changed the state from SENT also removed the request from its list.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: afd8d65 ("9P: Add cancelled() to the transport functions.")
Cc: stable@vger.kernel.org
Signed-off-by: Nalivayko Sergey <Sergey.Nalivayko@kaspersky.com>
Message-ID: <20250715154815.3501030-1-Sergey.Nalivayko@kaspersky.com>
[updated the check from status == RECV || status == ERROR to status != SENT]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 13, 2025
commit c18ecd9 upstream.

As syzbot reported below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/file.c:1243!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5354 Comm: syz.0.0 Not tainted 6.17.0-rc1-syzkaller-00211-g90d970cade8e #0 PREEMPT(full)
RIP: 0010:f2fs_truncate_hole+0x69e/0x6c0 fs/f2fs/file.c:1243
Call Trace:
 <TASK>
 f2fs_punch_hole+0x2db/0x330 fs/f2fs/file.c:1306
 f2fs_fallocate+0x546/0x990 fs/f2fs/file.c:2018
 vfs_fallocate+0x666/0x7e0 fs/open.c:342
 ksys_fallocate fs/open.c:366 [inline]
 __do_sys_fallocate fs/open.c:371 [inline]
 __se_sys_fallocate fs/open.c:369 [inline]
 __x64_sys_fallocate+0xc0/0x110 fs/open.c:369
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1e65f8ebe9

w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent
truncation range in direct node in f2fs_truncate_hole().

The root cause is: a non-inode dnode may has the same footer.ino and
footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE()
may return wrong blkaddr count which may be 923 typically, by chance,
dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below
statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 || ...).

	count = min(end_offset - dn.ofs_in_node, pg_end - pg_start);

This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing
passing the new_type to sanity_check_node_footer in f2fs_get_node_folio()
to detect corruption that a non-inode dnode has the same footer.ino and
footer.nid.

Scripts to reproduce:
mkfs.f2fs -f /dev/vdb
mount /dev/vdb /mnt/f2fs
touch /mnt/f2fs/foo
touch /mnt/f2fs/bar
dd if=/dev/zero of=/mnt/f2fs/foo bs=1M count=8
umount /mnt/f2fs
inject.f2fs --node --mb i_nid --nid 4 --idx 0 --val 5 /dev/vdb
mount /dev/vdb /mnt/f2fs
xfs_io /mnt/f2fs/foo -c "fpunch 6984k 4k"

Cc: stable@kernel.org
Reported-by: syzbot+b9c7ffd609c3f09416ab@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68a68e27.050a0220.1a3988.0002.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit bea3e1d ]

BUG: KASAN: slab-out-of-bounds in hfsplus_uni2asc+0xa71/0xb90 fs/hfsplus/unicode.c:186
Read of size 2 at addr ffff8880289ef218 by task syz.6.248/14290

CPU: 0 UID: 0 PID: 14290 Comm: syz.6.248 Not tainted 6.16.4 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x5f0 mm/kasan/report.c:482
 kasan_report+0xca/0x100 mm/kasan/report.c:595
 hfsplus_uni2asc+0xa71/0xb90 fs/hfsplus/unicode.c:186
 hfsplus_listxattr+0x5b6/0xbd0 fs/hfsplus/xattr.c:738
 vfs_listxattr+0xbe/0x140 fs/xattr.c:493
 listxattr+0xee/0x190 fs/xattr.c:924
 filename_listxattr fs/xattr.c:958 [inline]
 path_listxattrat+0x143/0x360 fs/xattr.c:988
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcb/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe0e9fae16d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe0eae67f98 EFLAGS: 00000246 ORIG_RAX: 00000000000000c3
RAX: ffffffffffffffda RBX: 00007fe0ea205fa0 RCX: 00007fe0e9fae16d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000000
RBP: 00007fe0ea0480f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe0ea206038 R14: 00007fe0ea205fa0 R15: 00007fe0eae48000
 </TASK>

Allocated by task 14290:
 kasan_save_stack+0x24/0x50 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __do_kmalloc_node mm/slub.c:4333 [inline]
 __kmalloc_noprof+0x219/0x540 mm/slub.c:4345
 kmalloc_noprof include/linux/slab.h:909 [inline]
 hfsplus_find_init+0x95/0x1f0 fs/hfsplus/bfind.c:21
 hfsplus_listxattr+0x331/0xbd0 fs/hfsplus/xattr.c:697
 vfs_listxattr+0xbe/0x140 fs/xattr.c:493
 listxattr+0xee/0x190 fs/xattr.c:924
 filename_listxattr fs/xattr.c:958 [inline]
 path_listxattrat+0x143/0x360 fs/xattr.c:988
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcb/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

When hfsplus_uni2asc is called from hfsplus_listxattr,
it actually passes in a struct hfsplus_attr_unistr*.
The size of the corresponding structure is different from that of hfsplus_unistr,
so the previous fix (9445878) is insufficient.
The pointer on the unicode buffer is still going beyond the allocated memory.

This patch introduces two warpper functions hfsplus_uni2asc_xattr_str and
hfsplus_uni2asc_str to process two unicode buffers,
struct hfsplus_attr_unistr* and struct hfsplus_unistr* respectively.
When ustrlen value is bigger than the allocated memory size,
the ustrlen value is limited to an safe size.

Fixes: 9445878 ("hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()")
Signed-off-by: Kang Chen <k.chen@smail.nju.edu.cn>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250909031316.1647094-1-k.chen@smail.nju.edu.cn
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 67378b7 ]

[BUG DURING BS > PS TEST]
When running the following script on a btrfs whose block size is larger
than page size, e.g. 8K block size and 4K page size, it will trigger a
kernel BUG:

  # mkfs.btrfs -s 8k $dev
  # mount $dev $mnt
  # mkdir $mnt/dir
  # ln -s dir $mnt/link
  # ls $mnt/link

The call trace looks like this:

  BTRFS warning (device dm-2): support for block size 8192 with page size 4096 is experimental, some features may be missing
  BTRFS info (device dm-2): checking UUID tree
  BTRFS info (device dm-2): enabling ssd optimizations
  BTRFS info (device dm-2): enabling free space tree
  ------------[ cut here ]------------
  kernel BUG at /home/adam/linux/include/linux/highmem.h:275!
  Oops: invalid opcode: 0000 [#1] SMP
  CPU: 8 UID: 0 PID: 667 Comm: ls Tainted: G           OE       6.17.0-rc4-custom+ torvalds#283 PREEMPT(full)
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:zero_user_segments.constprop.0+0xdc/0xe0 [btrfs]
  Call Trace:
   <TASK>
   btrfs_get_extent.cold+0x85/0x101 [btrfs 7453c70c03e631c8d8bfdd4264fa62d3e238da6f]
   btrfs_do_readpage+0x244/0x750 [btrfs 7453c70c03e631c8d8bfdd4264fa62d3e238da6f]
   btrfs_read_folio+0x9c/0x100 [btrfs 7453c70c03e631c8d8bfdd4264fa62d3e238da6f]
   filemap_read_folio+0x37/0xe0
   do_read_cache_folio+0x94/0x3e0
   __page_get_link.isra.0+0x20/0x90
   page_get_link+0x16/0x40
   step_into+0x69b/0x830
   path_lookupat+0xa7/0x170
   filename_lookup+0xf7/0x200
   ? set_ptes.isra.0+0x36/0x70
   vfs_statx+0x7a/0x160
   do_statx+0x63/0xa0
   __x64_sys_statx+0x90/0xe0
   do_syscall_64+0x82/0xae0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>

Please note bs > ps support is still under development and the
enablement patch is not even in btrfs development branch.

[CAUSE]
Btrfs reuses its data folio read path to handle symbolic links, as the
symbolic link target is stored as an inline data extent.

But for newly created inodes, btrfs only set the minimal order if the
target inode is a regular file.

Thus for above newly created symbolic link, it doesn't properly respect
the minimal folio order, and triggered the above crash.

[FIX]
Call btrfs_set_inode_mapping_order() unconditionally inside
btrfs_create_new_inode().

For symbolic links this will fix the crash as now the folio will meet
the minimal order.

For regular files this brings no change.

For directory/bdev/char and all the other types of inodes, they won't
go through the data read path, thus no effect either.

Fixes: cc38d17 ("btrfs: enable large data folio support under CONFIG_BTRFS_EXPERIMENTAL")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit b0b4518 ]

Change the 'ret' variable in blk_stack_limits() from unsigned int to int,
as it needs to store negative value -1.

Storing the negative error codes in unsigned type, or performing equality
comparisons (e.g., ret == -1), doesn't cause an issue at runtime [1] but
can be confusing.  Additionally, assigning negative error codes to unsigned
type may trigger a GCC warning when the -Wsign-conversion flag is enabled.

No effect on runtime.

Link: https://lore.kernel.org/all/x3wogjf6vgpkisdhg3abzrx7v7zktmdnfmqeih5kosszmagqfs@oh3qxrgzkikf/ #1
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Fixes: fe0b393 ("block: Correct handling of bottom device misaligment")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250902130930.68317-1-rongqianfeng@vivo.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit fd2e081 ]

The ns_bpf_qdisc selftest triggers a kernel panic:

    Unable to handle kernel paging request at virtual address ffffffffa38dbf58
    Current test_progs pgtable: 4K pagesize, 57-bit VAs, pgdp=0x00000001109cc000
    [ffffffffa38dbf58] pgd=000000011fffd801, p4d=000000011fffd401, pud=000000011fffd001, pmd=0000000000000000
    Oops [#1]
    Modules linked in: bpf_testmod(OE) xt_conntrack nls_iso8859_1 [...] [last unloaded: bpf_testmod(OE)]
    CPU: 1 UID: 0 PID: 23584 Comm: test_progs Tainted: G        W  OE       6.17.0-rc1-g2465bb83e0b4 #1 NONE
    Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
    Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2024.01+dfsg-1ubuntu5.1 01/01/2024
    epc : __qdisc_run+0x82/0x6f0
     ra : __qdisc_run+0x6e/0x6f0
    epc : ffffffff80bd5c7a ra : ffffffff80bd5c66 sp : ff2000000eecb550
     gp : ffffffff82472098 tp : ff60000096895940 t0 : ffffffff8001f180
     t1 : ffffffff801e1664 t2 : 0000000000000000 s0 : ff2000000eecb5d0
     s1 : ff60000093a6a600 a0 : ffffffffa38dbee8 a1 : 0000000000000001
     a2 : ff2000000eecb510 a3 : 0000000000000001 a4 : 0000000000000000
     a5 : 0000000000000010 a6 : 0000000000000000 a7 : 0000000000735049
     s2 : ffffffffa38dbee8 s3 : 0000000000000040 s4 : ff6000008bcda000
     s5 : 0000000000000008 s6 : ff60000093a6a680 s7 : ff60000093a6a6f0
     s8 : ff60000093a6a6ac s9 : ff60000093140000 s10: 0000000000000000
     s11: ff2000000eecb9d0 t3 : 0000000000000000 t4 : 0000000000ff0000
     t5 : 0000000000000000 t6 : ff60000093a6a8b6
    status: 0000000200000120 badaddr: ffffffffa38dbf58 cause: 000000000000000d
    [<ffffffff80bd5c7a>] __qdisc_run+0x82/0x6f0
    [<ffffffff80b6fe58>] __dev_queue_xmit+0x4c0/0x1128
    [<ffffffff80b80ae0>] neigh_resolve_output+0xd0/0x170
    [<ffffffff80d2daf6>] ip6_finish_output2+0x226/0x6c8
    [<ffffffff80d31254>] ip6_finish_output+0x10c/0x2a0
    [<ffffffff80d31446>] ip6_output+0x5e/0x178
    [<ffffffff80d2e232>] ip6_xmit+0x29a/0x608
    [<ffffffff80d6f4c6>] inet6_csk_xmit+0xe6/0x140
    [<ffffffff80c985e4>] __tcp_transmit_skb+0x45c/0xaa8
    [<ffffffff80c995fe>] tcp_connect+0x9ce/0xd10
    [<ffffffff80d66524>] tcp_v6_connect+0x4ac/0x5e8
    [<ffffffff80cc19b8>] __inet_stream_connect+0xd8/0x318
    [<ffffffff80cc1c36>] inet_stream_connect+0x3e/0x68
    [<ffffffff80b42b20>] __sys_connect_file+0x50/0x88
    [<ffffffff80b42bee>] __sys_connect+0x96/0xc8
    [<ffffffff80b42c40>] __riscv_sys_connect+0x20/0x30
    [<ffffffff80e5bcae>] do_trap_ecall_u+0x256/0x378
    [<ffffffff80e69af2>] handle_exception+0x14a/0x156
    Code: 892a 0363 1205 489c 8bc1 c7e5 2d03 084a 2703 080a (2783) 0709
    ---[ end trace 0000000000000000 ]---

The bpf_fifo_dequeue prog returns a skb which is a pointer. The pointer
is treated as a 32bit value and sign extend to 64bit in epilogue. This
behavior is right for most bpf prog types but wrong for struct ops which
requires RISC-V ABI.

So let's sign extend struct ops return values according to the function
model and RISC-V ABI([0]).

  [0]: https://riscv.org/wp-content/uploads/2024/12/riscv-calling.pdf

Fixes: 25ad106 ("riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/bpf/20250908012448.1695-1-hengqi.chen@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 0b2cd50 ]

generic/091 may fail, then it bisects to the bad commit ba8dac3
("f2fs: fix to zero post-eof page").

What will cause generic/091 to fail is something like below Testcase #1:
1. write 16k as compressed blocks
2. truncate to 12k
3. truncate to 20k
4. verify data in range of [12k, 16k], however data is not zero as
expected

Script of Testcase #1
mkfs.f2fs -f -O extra_attr,compression /dev/vdb
mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs
dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1
dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc
sync
truncate -s $((12*1024)) /mnt/f2fs/file
truncate -s $((20*1024)) /mnt/f2fs/file
dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3
od /mnt/f2fs/data
umount /mnt/f2fs

Analisys:
in step 2), we will redirty all data pages from #0 to #3 in compressed
cluster, and zero page #3,
in step 3), f2fs_setattr() will call f2fs_zero_post_eof_page() to drop
all page cache post eof, includeing dirtied page #3,
in step 4) when we read data from page #3, it will decompressed cluster
and extra random data to page #3, finally, we hit the non-zeroed data
post eof.

However, the commit ba8dac3 ("f2fs: fix to zero post-eof page") just
let the issue be reproduced easily, w/o the commit, it can reproduce this
bug w/ below Testcase #2:
1. write 16k as compressed blocks
2. truncate to 8k
3. truncate to 12k
4. truncate to 20k
5. verify data in range of [12k, 16k], however data is not zero as
expected

Script of Testcase #2
mkfs.f2fs -f -O extra_attr,compression /dev/vdb
mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs
dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1
dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc
sync
truncate -s $((8*1024)) /mnt/f2fs/file
truncate -s $((12*1024)) /mnt/f2fs/file
truncate -s $((20*1024)) /mnt/f2fs/file
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3
od /mnt/f2fs/data
umount /mnt/f2fs

Anlysis:
in step 2), we will redirty all data pages from #0 to #3 in compressed
cluster, and zero page #2 and #3,
in step 3), we will truncate page #3 in page cache,
in step 4), expand file size,
in step 5), hit random data post eof w/ the same reason in Testcase #1.

Root Cause:
In f2fs_truncate_partial_cluster(), after we truncate partial data block
on compressed cluster, all pages in cluster including the one post eof
will be dirtied, after another tuncation, dirty page post eof will be
dropped, however on-disk compressed cluster is still valid, it may
include non-zero data post eof, result in exposing previous non-zero data
post eof while reading.

Fix:
In f2fs_truncate_partial_cluster(), let change as below to fix:
- call filemap_write_and_wait_range() to flush dirty page
- call truncate_pagecache() to drop pages or zero partial page post eof
- call f2fs_do_truncate_blocks() to truncate non-compress cluster to
  last valid block

Fixes: 3265d3d ("f2fs: support partial truncation on compressed inode")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
…stency()

[ Upstream commit 930a9a6 ]

syzbot reported a f2fs bug as below:

Oops: gen[  107.736417][ T5848] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 5848 Comm: syz-executor263 Tainted: G        W           6.17.0-rc1-syzkaller-00014-g0e39a731820a #0 PREEMPT_{RT,(full)}
RIP: 0010:strcmp+0x3c/0xc0 lib/string.c:284
Call Trace:
 <TASK>
 f2fs_check_quota_consistency fs/f2fs/super.c:1188 [inline]
 f2fs_check_opt_consistency+0x1378/0x2c10 fs/f2fs/super.c:1436
 __f2fs_remount fs/f2fs/super.c:2653 [inline]
 f2fs_reconfigure+0x482/0x1770 fs/f2fs/super.c:5297
 reconfigure_super+0x224/0x890 fs/super.c:1077
 do_remount fs/namespace.c:3314 [inline]
 path_mount+0xd18/0xfe0 fs/namespace.c:4112
 do_mount fs/namespace.c:4133 [inline]
 __do_sys_mount fs/namespace.c:4344 [inline]
 __se_sys_mount+0x317/0x410 fs/namespace.c:4321
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The direct reason is f2fs_check_quota_consistency() may suffer null-ptr-deref
issue in strcmp().

The bug can be reproduced w/ below scripts:
mkfs.f2fs -f /dev/vdb
mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs
quotacheck -uc /mnt/f2fs/
umount /mnt/f2fs
mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs
mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs
umount /mnt/f2fs

So, before old_qname and new_qname comparison, we need to check whether
they are all valid pointers, fix it.

Reported-by: syzbot+d371efea57d5aeab877b@syzkaller.appspotmail.com
Fixes: d185351 ("f2fs: separate the options parsing and options checking")
Closes: https://lore.kernel.org/linux-f2fs-devel/689ff889.050a0220.e29e5.0037.GAE@google.com
Cc: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit fce6fee ]

Commit 26a8bf9 ("wifi: rtw88: Lock rtwdev->mutex before setting
the LED") made rtw_led_set() sleep, but that's not allowed. Fix it by
using the brightness_set_blocking member of struct led_classdev for
PCI devices too. This one is allowed to sleep.

bad: scheduling from the idle thread!
nix kernel: CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G        W  O        6.16.0 #1-NixOS PREEMPT(voluntary)
nix kernel: Tainted: [W]=WARN, [O]=OOT_MODULE
nix kernel: Hardware name: [REDACTED]
nix kernel: Call Trace:
nix kernel:  <IRQ>
nix kernel:  dump_stack_lvl+0x63/0x90
nix kernel:  dequeue_task_idle+0x2d/0x50
nix kernel:  __schedule+0x191/0x1310
nix kernel:  ? xas_load+0x11/0xd0
nix kernel:  schedule+0x2b/0xe0
nix kernel:  schedule_preempt_disabled+0x19/0x30
nix kernel:  __mutex_lock.constprop.0+0x3fd/0x7d0
nix kernel:  rtw_led_set+0x27/0x60 [rtw_core]
nix kernel:  led_blink_set_nosleep+0x56/0xb0
nix kernel:  led_trigger_blink+0x49/0x80
nix kernel:  ? __pfx_tpt_trig_timer+0x10/0x10 [mac80211]
nix kernel:  call_timer_fn+0x2f/0x140
nix kernel:  ? __pfx_tpt_trig_timer+0x10/0x10 [mac80211]
nix kernel:  __run_timers+0x21a/0x2b0
nix kernel:  run_timer_softirq+0x8e/0x100
nix kernel:  handle_softirqs+0xea/0x2c0
nix kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
nix kernel:  __irq_exit_rcu+0xdc/0x100
nix kernel:  sysvec_apic_timer_interrupt+0x7c/0x90
nix kernel:  </IRQ>
nix kernel:  <TASK>
nix kernel:  asm_sysvec_apic_timer_interrupt+0x1a/0x20
nix kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x450
nix kernel: Code: 00 e8 08 7c 2e ff e8 d3 ee ff ff 49 89 c6 0f 1f 44 00 00 31 ff e8 c4 d1 2c ff 80 7d d7 00 0f 85 5d 02 00 00 fb 0f 1f 44 00 00 <45> 85 ff 0f 88 a0 01 00 00 49 63 f7 4c 89 f2 48 8d 0>
nix kernel: RSP: 0018:ffffd579801c7e68 EFLAGS: 00000246
nix kernel: RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
nix kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
nix kernel: RBP: ffffd579801c7ea0 R08: 0000000000000000 R09: 0000000000000000
nix kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8eab0462a400
nix kernel: R13: ffffffff9a7d7a20 R14: 00000003aebe751d R15: 0000000000000003
nix kernel:  ? cpuidle_enter_state+0xbc/0x450
nix kernel:  cpuidle_enter+0x32/0x50
nix kernel:  do_idle+0x1b1/0x210
nix kernel:  cpu_startup_entry+0x2d/0x30
nix kernel:  start_secondary+0x118/0x140
nix kernel:  common_startup_64+0x13e/0x141
nix kernel:  </TASK>

Fixes: 26a8bf9 ("wifi: rtw88: Lock rtwdev->mutex before setting the LED")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/ad8a49ef-4f2d-4a61-8292-952db9c4eb65@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 6fc6167 ]

Current code will validate current plane and previous plane to
confirm they can share a SSPP with multi-rect mode. The SSPP
is already allocated for previous plane, while current plane
is not associated with any SSPP yet. Null pointer is referenced
when validating the SSPP of current plane. Skip SSPP validation
for current plane.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000888ac3000
[0000000000000020] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1]  SMP
Modules linked in:
CPU: 4 UID: 0 PID: 1891 Comm: modetest Tainted: G S                  6.15.0-rc2-g3ee3f6e1202e torvalds#335 PREEMPT
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: SM8650 EV1 rev1 4slam 2et (DT)
pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : dpu_plane_is_multirect_capable+0x68/0x90
lr : dpu_assign_plane_resources+0x288/0x410
sp : ffff800093dcb770
x29: ffff800093dcb770 x28: 0000000000002000 x27: ffff000817c6c000
x26: ffff000806b46368 x25: ffff0008013f6080 x24: ffff00080cbf4800
x23: ffff000810842680 x22: ffff0008013f1080 x21: ffff00080cc86080
x20: ffff000806b463b0 x19: ffff00080cbf5a00 x18: 00000000ffffffff
x17: 707a5f657a696c61 x16: 0000000000000003 x15: 0000000000002200
x14: 00000000ffffffff x13: 00aaaaaa00aaaaaa x12: 0000000000000000
x11: ffff000817c6e2b8 x10: 0000000000000000 x9 : ffff80008106a950
x8 : ffff00080cbf48f4 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000438 x3 : 0000000000000438
x2 : ffff800082e245e0 x1 : 0000000000000008 x0 : 0000000000000000
Call trace:
 dpu_plane_is_multirect_capable+0x68/0x90 (P)
 dpu_crtc_atomic_check+0x5bc/0x650
 drm_atomic_helper_check_planes+0x13c/0x220
 drm_atomic_helper_check+0x58/0xb8
 msm_atomic_check+0xd8/0xf0
 drm_atomic_check_only+0x4a8/0x968
 drm_atomic_commit+0x50/0xd8
 drm_atomic_helper_update_plane+0x140/0x188
 __setplane_atomic+0xfc/0x148
 drm_mode_setplane+0x164/0x378
 drm_ioctl_kernel+0xc0/0x140
 drm_ioctl+0x20c/0x500
 __arm64_sys_ioctl+0xbc/0xf8
 invoke_syscall+0x50/0x120
 el0_svc_common.constprop.0+0x48/0xf8
 do_el0_svc+0x28/0x40
 el0_svc+0x30/0xd0
 el0t_64_sync_handler+0x144/0x168
 el0t_64_sync+0x198/0x1a0
Code: b9402021 370fffc1 f9401441 3707ff81 (f94010a1)
---[ end trace 0000000000000000 ]---

Fixes: 3ed12a3 ("drm/msm/dpu: allow sharing SSPP between planes")
Signed-off-by: Jun Nie <jun.nie@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/669224/
Link: https://lore.kernel.org/r/20250819-v6-16-rc2-quad-pipe-upstream-v15-1-2c7a85089db8@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
…referencing

[ Upstream commit 62e59ff ]

The function do_fanotify_mark() does not validate if
mnt_ns_from_dentry() returns NULL before dereferencing mntns->user_ns.
This causes a NULL pointer dereference in do_fanotify_mark() if the
path is not a mount namespace object.

Fix this by checking mnt_ns_from_dentry()'s return value before
dereferencing it.

Before the patch

$ gcc fanotify_nullptr.c -o fanotify_nullptr
$ mkdir A
$ ./fanotify_nullptr
Fanotify fd: 3
fanotify_mark: Operation not permitted
$ unshare -Urm
Fanotify fd: 3
Killed

int main(void){
    int ffd;
    ffd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_MNT, 0);
    if(ffd < 0){
        perror("fanotify_init");
        exit(EXIT_FAILURE);
    }

    printf("Fanotify fd: %d\n",ffd);

    if(fanotify_mark(ffd, FAN_MARK_ADD | FAN_MARK_MNTNS,
FAN_MNT_ATTACH, AT_FDCWD, "A") < 0){
        perror("fanotify_mark");
        exit(EXIT_FAILURE);
    }

return 0;
}

After the patch

$ gcc fanotify_nullptr.c -o fanotify_nullptr
$ mkdir A
$ ./fanotify_nullptr
Fanotify fd: 3
fanotify_mark: Operation not permitted
$ unshare -Urm
Fanotify fd: 3
fanotify_mark: Invalid argument

[   25.694973] BUG: kernel NULL pointer dereference, address: 0000000000000038
[   25.695006] #PF: supervisor read access in kernel mode
[   25.695012] #PF: error_code(0x0000) - not-present page
[   25.695017] PGD 109a30067 P4D 109a30067 PUD 142b46067 PMD 0
[   25.695025] Oops: Oops: 0000 [#1] SMP NOPTI
[   25.695032] CPU: 4 UID: 1000 PID: 1478 Comm: fanotify_nullpt Not
tainted 6.17.0-rc4 #1 PREEMPT(lazy)
[   25.695040] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[   25.695049] RIP: 0010:do_fanotify_mark+0x817/0x950
[   25.695066] Code: 04 00 00 e9 45 fd ff ff 48 8b 7c 24 48 4c 89 54
24 18 4c 89 5c 24 10 4c 89 0c 24 e8 b3 11 fc ff 4c 8b 54 24 18 4c 8b
5c 24 10 <48> 8b 78 38 4c 8b 0c 24 49 89 c4 e9 13 fd ff ff 8b 4c 24 28
85 c9
[   25.695081] RSP: 0018:ffffd31c469e3c08 EFLAGS: 00010203
[   25.695104] RAX: 0000000000000000 RBX: 0000000001000000 RCX: ffff8eb48aebd220
[   25.695110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8eb4835e8180
[   25.695115] RBP: 0000000000000111 R08: 0000000000000000 R09: 0000000000000000
[   25.695142] R10: ffff8eb48a7d56c0 R11: ffff8eb482bede00 R12: 00000000004012a7
[   25.695148] R13: 0000000000000110 R14: 0000000000000001 R15: ffff8eb48a7d56c0
[   25.695154] FS:  00007f8733bda740(0000) GS:ffff8eb61ce5f000(0000)
knlGS:0000000000000000
[   25.695162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.695170] CR2: 0000000000000038 CR3: 0000000136994006 CR4: 00000000003706f0
[   25.695201] Call Trace:
[   25.695209]  <TASK>
[   25.695215]  __x64_sys_fanotify_mark+0x1f/0x30
[   25.695222]  do_syscall_64+0x82/0x2c0
...

Fixes: 58f5fbe ("fanotify: support watching filesystems and mounts inside userns")
Link: https://patch.msgid.link/CAPhRvkw4ONypNsJrCnxbKnJbYmLHTDEKFC4C_num_5sVBVa8jg@mail.gmail.com
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit b0531cd ]

Similar to previous commit 2a934fd ("media: v4l2-dev: fix error
handling in __video_register_device()"), the release hook should be set
before device_register(). Otherwise, when device_register() return error
and put_device() try to callback the release function, the below warning
may happen.

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 4760 at drivers/base/core.c:2567 device_release+0x1bd/0x240 drivers/base/core.c:2567
  Modules linked in:
  CPU: 1 UID: 0 PID: 4760 Comm: syz.4.914 Not tainted 6.17.0-rc3+ #1 NONE
  RIP: 0010:device_release+0x1bd/0x240 drivers/base/core.c:2567
  Call Trace:
   <TASK>
   kobject_cleanup+0x136/0x410 lib/kobject.c:689
   kobject_release lib/kobject.c:720 [inline]
   kref_put include/linux/kref.h:65 [inline]
   kobject_put+0xe9/0x130 lib/kobject.c:737
   put_device+0x24/0x30 drivers/base/core.c:3797
   pps_register_cdev+0x2da/0x370 drivers/pps/pps.c:402
   pps_register_source+0x2f6/0x480 drivers/pps/kapi.c:108
   pps_tty_open+0x190/0x310 drivers/pps/clients/pps-ldisc.c:57
   tty_ldisc_open+0xa7/0x120 drivers/tty/tty_ldisc.c:432
   tty_set_ldisc+0x333/0x780 drivers/tty/tty_ldisc.c:563
   tiocsetd drivers/tty/tty_io.c:2429 [inline]
   tty_ioctl+0x5d1/0x1700 drivers/tty/tty_io.c:2728
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:598 [inline]
   __se_sys_ioctl fs/ioctl.c:584 [inline]
   __x64_sys_ioctl+0x194/0x210 fs/ioctl.c:584
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x5f/0x2a0 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

Before commit c79a39d ("pps: Fix a use-after-free"),
pps_register_cdev() call device_create() to create pps->dev, which will
init dev->release to device_create_release(). Now the comment is outdated,
just remove it.

Thanks for the reminder from Calvin Owens, 'kfree_pps' should be removed
in pps_register_source() to avoid a double free in the failure case.

Link: https://lore.kernel.org/all/20250827065010.3208525-1-wangliang74@huawei.com/
Fixes: c79a39d ("pps: Fix a use-after-free")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-By: Calvin Owens <calvin@wbinvd.org>
Link: https://lore.kernel.org/r/20250830075023.3498174-1-wangliang74@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit f028bca ]

The drm_gem_for_each_gpuvm_bo() call from lookup_vma() accesses
drm_gem_obj.gpuva.list, which is not initialized when the drm driver
does not support DRIVER_GEM_GPUVA feature. Enable it for msm_kms
drm driver to fix the splat seen when msm.separate_gpu_drm=1 modparam
is set:

[    9.506020] Unable to handle kernel paging request at virtual address fffffffffffffff0
[    9.523160] Mem abort info:
[    9.523161]   ESR = 0x0000000096000006
[    9.523163]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.523165]   SET = 0, FnV = 0
[    9.523166]   EA = 0, S1PTW = 0
[    9.523167]   FSC = 0x06: level 2 translation fault
[    9.523169] Data abort info:
[    9.523170]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[    9.523171]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    9.523172]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    9.523174] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000ad370f000
[    9.523176] [fffffffffffffff0] pgd=0000000000000000, p4d=0000000ad4787403, pud=0000000ad4788403, pmd=0000000000000000
[    9.523184] Internal error: Oops: 0000000096000006 [#1]  SMP
[    9.592968] CPU: 9 UID: 0 PID: 448 Comm: (udev-worker) Not tainted 6.17.0-rc4-assorted-fix-00005-g0e9bb53a2282-dirty #3 PREEMPT
[    9.592970] Hardware name: Qualcomm CRD, BIOS 6.0.240718.BOOT.MXF.2.4-00515-HAMOA-1 07/18/2024
[    9.592971] pstate: a140000 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    9.592973] pc : lookup_vma+0x28/0xe0 [msm]
[    9.592996] lr : get_vma_locked+0x2c/0x128 [msm]
[    9.763632] sp : ffff800082dab460
[    9.763666] Call trace:
[    9.763668]  lookup_vma+0x28/0xe0 [msm] (P)
[    9.763688]  get_vma_locked+0x2c/0x128 [msm]
[    9.763706]  msm_gem_get_and_pin_iova_range+0x68/0x11c [msm]
[    9.763723]  msm_gem_get_and_pin_iova+0x18/0x24 [msm]
[    9.763740]  msm_fbdev_driver_fbdev_probe+0xd0/0x258 [msm]
[    9.763760]  __drm_fb_helper_initial_config_and_unlock+0x288/0x528 [drm_kms_helper]
[    9.763771]  drm_fb_helper_initial_config+0x44/0x54 [drm_kms_helper]
[    9.763779]  drm_fbdev_client_hotplug+0x84/0xd4 [drm_client_lib]
[    9.763782]  drm_client_register+0x58/0x9c [drm]
[    9.763806]  drm_fbdev_client_setup+0xe8/0xcf0 [drm_client_lib]
[    9.763809]  drm_client_setup+0xb4/0xd8 [drm_client_lib]
[    9.763811]  msm_drm_kms_post_init+0x2c/0x3c [msm]
[    9.763830]  msm_drm_init+0x1a8/0x22c [msm]
[    9.763848]  msm_drm_bind+0x30/0x3c [msm]
[    9.919273]  try_to_bring_up_aggregate_device+0x168/0x1d4
[    9.919283]  __component_add+0xa4/0x170
[    9.919286]  component_add+0x14/0x20
[    9.919288]  msm_dp_display_probe_tail+0x4c/0xac [msm]
[    9.919315]  msm_dp_auxbus_done_probe+0x14/0x20 [msm]
[    9.919335]  dp_aux_ep_probe+0x4c/0xf0 [drm_dp_aux_bus]
[    9.919341]  really_probe+0xbc/0x298
[    9.919345]  __driver_probe_device+0x78/0x12c
[    9.919348]  driver_probe_device+0x40/0x160
[    9.919350]  __driver_attach+0x94/0x19c
[    9.919353]  bus_for_each_dev+0x74/0xd4
[    9.919355]  driver_attach+0x24/0x30
[    9.919358]  bus_add_driver+0xe4/0x208
[    9.919360]  driver_register+0x60/0x128
[    9.919363]  __dp_aux_dp_driver_register+0x24/0x30 [drm_dp_aux_bus]
[    9.919365]  atana33xc20_init+0x20/0x1000 [panel_samsung_atna33xc20]
[    9.919370]  do_one_initcall+0x6c/0x1b0
[    9.919374]  do_init_module+0x58/0x234
[    9.919377]  load_module+0x19cc/0x1bd4
[    9.919380]  init_module_from_file+0x84/0xc4
[    9.919382]  __arm64_sys_finit_module+0x1b8/0x2cc
[    9.919384]  invoke_syscall+0x48/0x110
[    9.919389]  el0_svc_common.constprop.0+0xc8/0xe8
[    9.919393]  do_el0_svc+0x20/0x2c
[    9.919396]  el0_svc+0x34/0xf0
[    9.919401]  el0t_64_sync_handler+0xa0/0xe4
[    9.919403]  el0t_64_sync+0x198/0x19c
[    9.919407] Code: eb0000bf 54000480 d100a003 aa0303e2 (f8418c44)
[    9.919410] ---[ end trace 0000000000000000 ]---

Fixes: 217ed15 ("drm/msm: enable separate binding of GPU and display devices")
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/672257/
Link: https://lore.kernel.org/r/20250902-assorted-sept-1-v1-1-f3ec9baed513@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit fbe6070 ]

In legacy mode, SSPTPTR is ignored if TT is not 00b or 01b. SSPTPTR
maybe uninitialized or zero in that case and may cause oops like:

 Oops: general protection fault, probably for non-canonical address
       0xf00087d3f000f000: 0000 [#1] SMP NOPTI
 CPU: 2 UID: 0 PID: 786 Comm: cat Not tainted 6.16.0 torvalds#191 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
 RIP: 0010:pgtable_walk_level+0x98/0x150
 RSP: 0018:ffffc90000f279c0 EFLAGS: 00010206
 RAX: 0000000040000000 RBX: ffffc90000f27ab0 RCX: 000000000000001e
 RDX: 0000000000000003 RSI: f00087d3f000f000 RDI: f00087d3f0010000
 RBP: ffffc90000f27a00 R08: ffffc90000f27a98 R09: 0000000000000002
 R10: 0000000000000000 R11: 0000000000000000 R12: f00087d3f000f000
 R13: 0000000000000000 R14: 0000000040000000 R15: ffffc90000f27a98
 FS:  0000764566dcb740(0000) GS:ffff8881f812c000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000764566d44000 CR3: 0000000109d81003 CR4: 0000000000772ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  pgtable_walk_level+0x88/0x150
  domain_translation_struct_show.isra.0+0x2d9/0x300
  dev_domain_translation_struct_show+0x20/0x40
  seq_read_iter+0x12d/0x490
...

Avoid walking the page table if TT is not 00b or 01b.

Fixes: 2b437e8 ("iommu/vt-d: debugfs: Support dumping a specified page table")
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250814163153.634680-1-vineeth@bitbyteword.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 570f945 ]

Lockdep gives a splat [1] when ser_hdl_work item is executed.  It is
scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
wiphy lock inside.  However, this workqueue can be flushed when e.g.
closing the interface and wiphy lock is already taken in that case.

Choosing wiphy_work_queue() for SER is likely not suitable.  Back on to
the global workqueue.

[1]:

 WARNING: possible circular locking dependency detected
 6.17.0-rc2 torvalds#17 Not tainted
 ------------------------------------------------------
 kworker/u32:1/61 is trying to acquire lock:
 ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]

 but task is already holding lock:
 ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}:
        process_one_work+0x7c6/0x1450
        worker_thread+0x49e/0xd00
        kthread+0x313/0x640
        ret_from_fork+0x221/0x300
        ret_from_fork_asm+0x1a/0x30

 -> #1 ((wq_completion)phy0){+.+.}-{0:0}:
        touch_wq_lockdep_map+0x8e/0x180
        __flush_workqueue+0x129/0x10d0
        ieee80211_stop_device+0xa8/0x110
        ieee80211_do_stop+0x14ce/0x2880
        ieee80211_stop+0x13a/0x2c0
        __dev_close_many+0x18f/0x510
        __dev_change_flags+0x25f/0x670
        netif_change_flags+0x7b/0x160
        do_setlink.isra.0+0x1640/0x35d0
        rtnl_newlink+0xd8c/0x1d30
        rtnetlink_rcv_msg+0x700/0xb80
        netlink_rcv_skb+0x11d/0x350
        netlink_unicast+0x49a/0x7a0
        netlink_sendmsg+0x759/0xc20
        ____sys_sendmsg+0x812/0xa00
        ___sys_sendmsg+0xf7/0x180
        __sys_sendmsg+0x11f/0x1b0
        do_syscall_64+0xbb/0x360
        entry_SYSCALL_64_after_hwframe+0x77/0x7f

 -> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
        __lock_acquire+0x124c/0x1d20
        lock_acquire+0x154/0x2e0
        __mutex_lock+0x17b/0x12f0
        ser_state_run+0x5e/0x180 [rtw89_core]
        rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
        process_one_work+0x82d/0x1450
        worker_thread+0x49e/0xd00
        kthread+0x313/0x640
        ret_from_fork+0x221/0x300
        ret_from_fork_asm+0x1a/0x30

 other info that might help us debug this:

 Chain exists of:
   &rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work)

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&ser->ser_hdl_work));
                                lock((wq_completion)phy0);
                                lock((work_completion)(&ser->ser_hdl_work));
   lock(&rdev->wiphy.mtx);

  *** DEADLOCK ***

 2 locks held by kworker/u32:1/61:
  #0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450
  #1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450

 stack backtrace:
 CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 torvalds#17 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
 Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5d/0x80
  print_circular_bug.cold+0x178/0x1be
  check_noncircular+0x14c/0x170
  __lock_acquire+0x124c/0x1d20
  lock_acquire+0x154/0x2e0
  __mutex_lock+0x17b/0x12f0
  ser_state_run+0x5e/0x180 [rtw89_core]
  rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
  process_one_work+0x82d/0x1450
  worker_thread+0x49e/0xd00
  kthread+0x313/0x640
  ret_from_fork+0x221/0x300
  ret_from_fork_asm+0x1a/0x30
  </TASK>

Found by Linux Verification Center (linuxtesting.org).

Fixes: ebfc919 ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250919210852.823912-5-pchelkin@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 1703fe4 ]

During mpt3sas_transport_port_remove(), messages were logged with
dev_printk() against &mpt3sas_port->port->dev. At this point the SAS
transport device may already be partially unregistered or freed, leading
to a crash when accessing its struct device.

Using ioc_info(), which logs via the PCI device (ioc->pdev->dev),
guaranteed to remain valid until driver removal.

[83428.295776] Oops: general protection fault, probably for non-canonical address 0x6f702f323a33312d: 0000 [#1] SMP NOPTI
[83428.295785] CPU: 145 UID: 0 PID: 113296 Comm: rmmod Kdump: loaded Tainted: G           OE       6.16.0-rc1+ #1 PREEMPT(voluntary)
[83428.295792] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[83428.295795] Hardware name: Dell Inc. Precision 7875 Tower/, BIOS 89.1.67 02/23/2024
[83428.295799] RIP: 0010:__dev_printk+0x1f/0x70
[83428.295805] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 d1 48 85 f6 74 52 4c 8b 46 50 4d 85 c0 74 1f 48 8b 46 68 48 85 c0 74 22 <48> 8b 08 0f b6 7f 01 48 c7 c2 db e8 42 ad 83 ef 30 e9 7b f8 ff ff
[83428.295813] RSP: 0018:ff85aeafc3137bb0 EFLAGS: 00010206
[83428.295817] RAX: 6f702f323a33312d RBX: ff4290ee81292860 RCX: 5000cca25103be32
[83428.295820] RDX: ff85aeafc3137bb8 RSI: ff4290eeb1966c00 RDI: ffffffffc1560845
[83428.295823] RBP: ff85aeafc3137c18 R08: 74726f702f303a33 R09: ff85aeafc3137bb8
[83428.295826] R10: ff85aeafc3137b18 R11: ff4290f5bd60fe68 R12: ff4290ee81290000
[83428.295830] R13: ff4290ee6e345de0 R14: ff4290ee81290000 R15: ff4290ee6e345e30
[83428.295833] FS:  00007fd9472a6740(0000) GS:ff4290f5ce96b000(0000) knlGS:0000000000000000
[83428.295837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83428.295840] CR2: 00007f242b4db238 CR3: 00000002372b8006 CR4: 0000000000771ef0
[83428.295844] PKRU: 55555554
[83428.295846] Call Trace:
[83428.295848]  <TASK>
[83428.295850]  _dev_printk+0x5c/0x80
[83428.295857]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.295863]  mpt3sas_transport_port_remove+0x1c7/0x420 [mpt3sas]
[83428.295882]  _scsih_remove_device+0x21b/0x280 [mpt3sas]
[83428.295894]  ? _scsih_expander_node_remove+0x108/0x140 [mpt3sas]
[83428.295906]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.295910]  mpt3sas_device_remove_by_sas_address.part.0+0x8f/0x110 [mpt3sas]
[83428.295921]  _scsih_expander_node_remove+0x129/0x140 [mpt3sas]
[83428.295933]  _scsih_expander_node_remove+0x6a/0x140 [mpt3sas]
[83428.295944]  scsih_remove+0x3f0/0x4a0 [mpt3sas]
[83428.295957]  pci_device_remove+0x3b/0xb0
[83428.295962]  device_release_driver_internal+0x193/0x200
[83428.295968]  driver_detach+0x44/0x90
[83428.295971]  bus_remove_driver+0x69/0xf0
[83428.295975]  pci_unregister_driver+0x2a/0xb0
[83428.295979]  _mpt3sas_exit+0x1f/0x300 [mpt3sas]
[83428.295991]  __do_sys_delete_module.constprop.0+0x174/0x310
[83428.295997]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296000]  ? __x64_sys_getdents64+0x9a/0x110
[83428.296005]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296009]  ? syscall_trace_enter+0xf6/0x1b0
[83428.296014]  do_syscall_64+0x7b/0x2c0
[83428.296019]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296023]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: f92363d ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit edf7e90 ]

As JY reported in bugzilla [1],

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
pc : [0xffffffe51d249484] f2fs_is_cp_guaranteed+0x70/0x98
lr : [0xffffffe51d24adbc] f2fs_merge_page_bio+0x520/0x6d4
CPU: 3 UID: 0 PID: 6790 Comm: kworker/u16:3 Tainted: P    B   W  OE      6.12.30-android16-5-maybe-dirty-4k #1 5f7701c9cbf727d1eebe77c89bbbeb3371e895e5
Tainted: [P]=PROPRIETARY_MODULE, [B]=BAD_PAGE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Workqueue: writeback wb_workfn (flush-254:49)
Call trace:
 f2fs_is_cp_guaranteed+0x70/0x98
 f2fs_inplace_write_data+0x174/0x2f4
 f2fs_do_write_data_page+0x214/0x81c
 f2fs_write_single_data_page+0x28c/0x764
 f2fs_write_data_pages+0x78c/0xce4
 do_writepages+0xe8/0x2fc
 __writeback_single_inode+0x4c/0x4b4
 writeback_sb_inodes+0x314/0x540
 __writeback_inodes_wb+0xa4/0xf4
 wb_writeback+0x160/0x448
 wb_workfn+0x2f0/0x5dc
 process_scheduled_works+0x1c8/0x458
 worker_thread+0x334/0x3f0
 kthread+0x118/0x1ac
 ret_from_fork+0x10/0x20

[1] https://bugzilla.kernel.org/show_bug.cgi?id=220575

The panic was caused by UAF issue w/ below race condition:

kworker
- writepages
 - f2fs_write_cache_pages
  - f2fs_write_single_data_page
   - f2fs_do_write_data_page
    - f2fs_inplace_write_data
     - f2fs_merge_page_bio
      - add_inu_page
      : cache page #1 into bio & cache bio in
        io->bio_list
  - f2fs_write_single_data_page
   - f2fs_do_write_data_page
    - f2fs_inplace_write_data
     - f2fs_merge_page_bio
      - add_inu_page
      : cache page #2 into bio which is linked
        in io->bio_list
						write
						- f2fs_write_begin
						: write page #1
						 - f2fs_folio_wait_writeback
						  - f2fs_submit_merged_ipu_write
						   - f2fs_submit_write_bio
						   : submit bio which inclues page #1 and #2

						software IRQ
						- f2fs_write_end_io
						 - fscrypt_free_bounce_page
						 : freed bounced page which belongs to page #2
      - inc_page_count( , WB_DATA_TYPE(data_folio), false)
      : data_folio points to fio->encrypted_page
        the bounced page can be freed before
        accessing it in f2fs_is_cp_guarantee()

It can reproduce w/ below testcase:
Run below script in shell #1:
for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \
-c "pwrite 0 32k" -c "fdatasync"

Run below script in shell #2:
for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \
-c "pwrite 0 32k" -c "fdatasync"

So, in f2fs_merge_page_bio(), let's avoid using fio->encrypted_page after
commit page into internal ipu cache.

Fixes: 0b20fce ("f2fs: cache global IPU bio")
Reported-by: JY <JY.Ho@mediatek.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
[ Upstream commit 998a67b ]

The crypto API, through the scatterlist API, expects input buffers to be
in linear memory.  We handle this with the cifs_sg_set_buf() helper
that converts vmalloc'd memory to their corresponding pages.

However, when we allocate our aead_request buffer (@creq in
smb2ops.c::crypt_message()), we do so with kvzalloc(), which possibly
puts aead_request->__ctx in vmalloc area.

AEAD algorithm then uses ->__ctx for its private/internal data and
operations, and uses sg_set_buf() for such data on a few places.

This works fine as long as @creq falls into kmalloc zone (small
requests) or vmalloc'd memory is still within linear range.

Tasks' stacks are vmalloc'd by default (CONFIG_VMAP_STACK=y), so too
many tasks will increment the base stacks' addresses to a point where
virt_addr_valid(buf) will fail (BUG() in sg_set_buf()) when that
happens.

In practice: too many parallel reads and writes on an encrypted mount
will trigger this bug.

To fix this, always alloc @creq with kmalloc() instead.
Also drop the @sensitive_size variable/arguments since
kfree_sensitive() doesn't need it.

Backtrace:

[  945.272081] ------------[ cut here ]------------
[  945.272774] kernel BUG at include/linux/scatterlist.h:209!
[  945.273520] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[  945.274412] CPU: 7 UID: 0 PID: 56 Comm: kworker/u33:0 Kdump: loaded Not tainted 6.15.0-lku-11779-g8e9d6efccdd7-dirty #1 PREEMPT(voluntary)
[  945.275736] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[  945.276877] Workqueue: writeback wb_workfn (flush-cifs-2)
[  945.277457] RIP: 0010:crypto_gcm_init_common+0x1f9/0x220
[  945.278018] Code: b0 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 c7 c0 00 00 00 80 48 2b 05 5c 58 e5 00 e9 58 ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 48 c7 04 24 01 00 00 00 48 8b
[  945.279992] RSP: 0018:ffffc90000a27360 EFLAGS: 00010246
[  945.280578] RAX: 0000000000000000 RBX: ffffc90001d85060 RCX: 0000000000000030
[  945.281376] RDX: 0000000000080000 RSI: 0000000000000000 RDI: ffffc90081d85070
[  945.282145] RBP: ffffc90001d85010 R08: ffffc90001d85000 R09: 0000000000000000
[  945.282898] R10: ffffc90001d85090 R11: 0000000000001000 R12: ffffc90001d85070
[  945.283656] R13: ffff888113522948 R14: ffffc90001d85060 R15: ffffc90001d85010
[  945.284407] FS:  0000000000000000(0000) GS:ffff8882e66cf000(0000) knlGS:0000000000000000
[  945.285262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  945.285884] CR2: 00007fa7ffdd31f4 CR3: 000000010540d000 CR4: 0000000000350ef0
[  945.286683] Call Trace:
[  945.286952]  <TASK>
[  945.287184]  ? crypt_message+0x33f/0xad0 [cifs]
[  945.287719]  crypto_gcm_encrypt+0x36/0xe0
[  945.288152]  crypt_message+0x54a/0xad0 [cifs]
[  945.288724]  smb3_init_transform_rq+0x277/0x300 [cifs]
[  945.289300]  smb_send_rqst+0xa3/0x160 [cifs]
[  945.289944]  cifs_call_async+0x178/0x340 [cifs]
[  945.290514]  ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[  945.291177]  smb2_async_writev+0x3e3/0x670 [cifs]
[  945.291759]  ? find_held_lock+0x32/0x90
[  945.292212]  ? netfs_advance_write+0xf2/0x310
[  945.292723]  netfs_advance_write+0xf2/0x310
[  945.293210]  netfs_write_folio+0x346/0xcc0
[  945.293689]  ? __pfx__raw_spin_unlock_irq+0x10/0x10
[  945.294250]  netfs_writepages+0x117/0x460
[  945.294724]  do_writepages+0xbe/0x170
[  945.295152]  ? find_held_lock+0x32/0x90
[  945.295600]  ? kvm_sched_clock_read+0x11/0x20
[  945.296103]  __writeback_single_inode+0x56/0x4b0
[  945.296643]  writeback_sb_inodes+0x229/0x550
[  945.297140]  __writeback_inodes_wb+0x4c/0xe0
[  945.297642]  wb_writeback+0x2f1/0x3f0
[  945.298069]  wb_workfn+0x300/0x490
[  945.298472]  process_one_work+0x1fe/0x590
[  945.298949]  worker_thread+0x1ce/0x3c0
[  945.299397]  ? __pfx_worker_thread+0x10/0x10
[  945.299900]  kthread+0x119/0x210
[  945.300285]  ? __pfx_kthread+0x10/0x10
[  945.300729]  ret_from_fork+0x119/0x1b0
[  945.301163]  ? __pfx_kthread+0x10/0x10
[  945.301601]  ret_from_fork_asm+0x1a/0x30
[  945.302055]  </TASK>

Fixes: d08089f ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
commit 8d33a03 upstream.

There is a race condition between dm device suspend and table load that
can lead to null pointer dereference. The issue occurs when suspend is
invoked before table load completes:

BUG: kernel NULL pointer dereference, address: 0000000000000054
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 6798 Comm: dmsetup Not tainted 6.6.0-g7e52f5f0ca9b torvalds#62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
RIP: 0010:blk_mq_wait_quiesce_done+0x0/0x50
Call Trace:
  <TASK>
  blk_mq_quiesce_queue+0x2c/0x50
  dm_stop_queue+0xd/0x20
  __dm_suspend+0x130/0x330
  dm_suspend+0x11a/0x180
  dev_suspend+0x27e/0x560
  ctl_ioctl+0x4cf/0x850
  dm_ctl_ioctl+0xd/0x20
  vfs_ioctl+0x1d/0x50
  __se_sys_ioctl+0x9b/0xc0
  __x64_sys_ioctl+0x19/0x30
  x64_sys_call+0x2c4a/0x4620
  do_syscall_64+0x9e/0x1b0

The issue can be triggered as below:

T1 						T2
dm_suspend					table_load
__dm_suspend					dm_setup_md_queue
						dm_mq_init_request_queue
						blk_mq_init_allocated_queue
						=> q->mq_ops = set->ops; (1)
dm_stop_queue / dm_wait_for_completion
=> q->tag_set NULL pointer!	(2)
						=> q->tag_set = set; (3)

Fix this by checking if a valid table (map) exists before performing
request-based suspend and waiting for target I/O. When map is NULL,
skip these table-dependent suspend steps.

Even when map is NULL, no I/O can reach any target because there is
no table loaded; I/O submitted in this state will fail early in the
DM layer. Skipping the table-dependent suspend logic in this case
is safe and avoids NULL pointer dereferences.

Fixes: c4576ae ("dm: fix request-based dm's use of dm_wait_for_completion")
Cc: stable@vger.kernel.org
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
commit 8b51b11 upstream.

The ns_bpf_qdisc selftest triggers a kernel panic:

  Oops[#1]:
  CPU 0 Unable to handle kernel paging request at virtual address 0000000000741d58, era == 90000000851b5ac0, ra == 90000000851b5aa4
  CPU: 0 UID: 0 PID: 449 Comm: test_progs Tainted: G           OE       6.16.0+ #3 PREEMPT(full)
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
  pc 90000000851b5ac0 ra 90000000851b5aa4 tp 90000001076b8000 sp 90000001076bb600
  a0 0000000000741ce8 a1 0000000000000001 a2 90000001076bb5c0 a3 0000000000000008
  a4 90000001004c4620 a5 9000000100741ce8 a6 0000000000000000 a7 0100000000000000
  t0 0000000000000010 t1 0000000000000000 t2 9000000104d24d30 t3 0000000000000001
  t4 4f2317da8a7e08c4 t5 fffffefffc002f00 t6 90000001004c4620 t7 ffffffffc61c5b3d
  t8 0000000000000000 u0 0000000000000001 s9 0000000000000050 s0 90000001075bc800
  s1 0000000000000040 s2 900000010597c400 s3 0000000000000008 s4 90000001075bc880
  s5 90000001075bc8f0 s6 0000000000000000 s7 0000000000741ce8 s8 0000000000000000
     ra: 90000000851b5aa4 __qdisc_run+0xac/0x8d8
    ERA: 90000000851b5ac0 __qdisc_run+0xc8/0x8d8
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 00000004 (PPLV0 +PIE -PWE)
   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
   BADV: 0000000000741d58
   PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
  Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
  Process test_progs (pid: 449, threadinfo=000000009af02b3a, task=00000000e9ba4956)
  Stack : 0000000000000000 90000001075bc8ac 90000000869524a8 9000000100741ce8
          90000001075bc800 9000000100415300 90000001075bc8ac 0000000000000000
          900000010597c400 900000008694a000 0000000000000000 9000000105b59000
          90000001075bc800 9000000100741ce8 0000000000000050 900000008513000c
          9000000086936000 0000000100094d4c fffffff400676208 0000000000000000
          9000000105b59000 900000008694a000 9000000086bf0dc0 9000000105b59000
          9000000086bf0d68 9000000085147010 90000001075be788 0000000000000000
          9000000086bf0f98 0000000000000001 0000000000000010 9000000006015840
          0000000000000000 9000000086be6c40 0000000000000000 0000000000000000
          0000000000000000 4f2317da8a7e08c4 0000000000000101 4f2317da8a7e08c4
          ...
  Call Trace:
  [<90000000851b5ac0>] __qdisc_run+0xc8/0x8d8
  [<9000000085130008>] __dev_queue_xmit+0x578/0x10f0
  [<90000000853701c0>] ip6_finish_output2+0x2f0/0x950
  [<9000000085374bc8>] ip6_finish_output+0x2b8/0x448
  [<9000000085370b24>] ip6_xmit+0x304/0x858
  [<90000000853c4438>] inet6_csk_xmit+0x100/0x170
  [<90000000852b32f0>] __tcp_transmit_skb+0x490/0xdd0
  [<90000000852b47fc>] tcp_connect+0xbcc/0x1168
  [<90000000853b9088>] tcp_v6_connect+0x580/0x8a0
  [<90000000852e7738>] __inet_stream_connect+0x170/0x480
  [<90000000852e7a98>] inet_stream_connect+0x50/0x88
  [<90000000850f2814>] __sys_connect+0xe4/0x110
  [<90000000850f2858>] sys_connect+0x18/0x28
  [<9000000085520c94>] do_syscall+0x94/0x1a0
  [<9000000083df1fb8>] handle_syscall+0xb8/0x158

  Code: 4001ad80  2400873  2400832d <240073cc> 001137ff  001133ff  6407b41f  001503cc  0280041d

  ---[ end trace 0000000000000000 ]---

The bpf_fifo_dequeue prog returns a skb which is a pointer. The pointer
is treated as a 32bit value and sign extend to 64bit in epilogue. This
behavior is right for most bpf prog types but wrong for struct ops which
requires LoongArch ABI.

So let's sign extend struct ops return values according to the LoongArch
ABI ([1]) and return value spec in function model.

[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Cc: stable@vger.kernel.org
Fixes: 6abf17d ("LoongArch: BPF: Add struct ops support for trampoline")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
commit e82406c upstream.

The current implementation does not support struct argument. This causes
a oops when running bpf selftest:

  $ ./test_progs -a tracing_struct
  Oops[#1]:
  CPU -1 Unable to handle kernel paging request at virtual address 0000000000000018, era == 9000000085bef268, ra == 90000000844f3938
  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     1-...0: (19 ticks this GP) idle=1094/1/0x4000000000000000 softirq=1380/1382 fqs=801
  rcu:     (detected by 0, t=5252 jiffies, g=1197, q=52 ncpus=4)
  Sending NMI from CPU 0 to CPUs 1:
  rcu: rcu_preempt kthread starved for 2495 jiffies! g1197 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=2
  rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
  rcu: RCU grace-period kthread stack dump:
  task:rcu_preempt     state:I stack:0     pid:15    tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
  Stack : 9000000100423e80 0000000000000402 0000000000000010 90000001003b0680
          9000000085d88000 0000000000000000 0000000000000040 9000000087159350
          9000000085c2b9b0 0000000000000001 900000008704a000 0000000000000005
          00000000ffff355b 00000000ffff355b 0000000000000000 0000000000000004
          9000000085d90510 0000000000000000 0000000000000002 7b5d998f8281e86e
          00000000ffff355c 7b5d998f8281e86e 000000000000003f 9000000087159350
          900000008715bf98 0000000000000005 9000000087036000 900000008704a000
          9000000100407c98 90000001003aff80 900000008715c4c0 9000000085c2b9b0
          00000000ffff355b 9000000085c33d3c 00000000000000b4 0000000000000000
          9000000007002150 00000000ffff355b 9000000084615480 0000000007000002
          ...
  Call Trace:
  [<9000000085c2a868>] __schedule+0x410/0x1520
  [<9000000085c2b9ac>] schedule+0x34/0x190
  [<9000000085c33d38>] schedule_timeout+0x98/0x140
  [<90000000845e9120>] rcu_gp_fqs_loop+0x5f8/0x868
  [<90000000845ed538>] rcu_gp_kthread+0x260/0x2e0
  [<900000008454e8a4>] kthread+0x144/0x238
  [<9000000085c26b60>] ret_from_kernel_thread+0x28/0xc8
  [<90000000844f20e4>] ret_from_kernel_thread_asm+0xc/0x88

  rcu: Stack dump where RCU GP kthread last ran:
  Sending NMI from CPU 0 to CPUs 2:
  NMI backtrace for cpu 2 skipped: idling at idle_exit+0x0/0x4

Reject it for now.

Cc: stable@vger.kernel.org
Fixes: f9b6b41 ("LoongArch: BPF: Add basic bpf trampoline support")
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
commit f04aad3 upstream.

syzkaller discovered the following crash: (kernel BUG)

[   44.607039] ------------[ cut here ]------------
[   44.607422] kernel BUG at mm/userfaultfd.c:2067!
[   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460

<snip other registers, drop unreliable trace>

[   44.617726] Call Trace:
[   44.617926]  <TASK>
[   44.619284]  userfaultfd_release+0xef/0x1b0
[   44.620976]  __fput+0x3f9/0xb60
[   44.621240]  fput_close_sync+0x110/0x210
[   44.622222]  __x64_sys_close+0x8f/0x120
[   44.622530]  do_syscall_64+0x5b/0x2f0
[   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   44.623244] RIP: 0033:0x7f365bb3f227

Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all().  Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.

The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode,
it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.

Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and
int are 32-bit wide.  This setup causes the following mishap during the &=
~VM_MERGEABLE assignment.

VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation.  This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is 0).
& operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.

Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.

Note: other VM_* flags are not affected: This only happens to the
VM_MERGEABLE flag, as the other VM_* flags are all constants of type int
and after ~ operation, they end up with leading 1 and are thus converted
to unsigned long with leading 1s.

Note 2:
After commit 31defc3 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:

[   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067

but the root-cause (flag-drop) remains the same.

[akpm@linux-foundation.org: rust bindgen wasn't able to handle BIT(), from Miguel]
  Link: https://lore.kernel.org/oe-kbuild-all/202510030449.VfSaAjvd-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251001090353.57523-2-acsjakub@amazon.de
Fixes: 7677f7f ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: SeongJae Park <sj@kernel.org>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
commit 3c3fac6 upstream.

In ext4_mb_init(), ext4_mb_avg_fragment_size_destroy() may be called
when sbi->s_mb_avg_fragment_size remains uninitialized (e.g., if groupinfo
slab cache allocation fails). Since ext4_mb_avg_fragment_size_destroy()
lacks null pointer checking, this leads to a null pointer dereference.

==================================================================
EXT4-fs: no memory for groupinfo slab cache
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP PTI
CPU:2 UID: 0 PID: 87 Comm:mount Not tainted 6.17.0-rc2 torvalds#1134 PREEMPT(none)
RIP: 0010:_raw_spin_lock_irqsave+0x1b/0x40
Call Trace:
 <TASK>
 xa_destroy+0x61/0x130
 ext4_mb_init+0x483/0x540
 __ext4_fill_super+0x116d/0x17b0
 ext4_fill_super+0xd3/0x280
 get_tree_bdev_flags+0x132/0x1d0
 vfs_get_tree+0x29/0xd0
 do_new_mount+0x197/0x300
 __x64_sys_mount+0x116/0x150
 do_syscall_64+0x50/0x1c0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

Therefore, add necessary null check to ext4_mb_avg_fragment_size_destroy()
to prevent this issue. The same fix is also applied to
ext4_mb_largest_free_orders_destroy().

Reported-by: syzbot+1713b1aa266195b916c2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1713b1aa266195b916c2
Cc: stable@kernel.org
Fixes: f7eaacb ("ext4: convert free groups order lists to xarrays")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
…sizes

The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: <stable@vger.kernel.org>
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Haien Liang <27873200@qq.com>
Tested-by: Shirong Liu <lsr1024@qq.com>
Tested-by: Haofeng Wu <s2600cw2@126.com>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

[Mingcong Bai: Resolved a minor merge conflict post-6.16 in
 drivers/gpu/drm/xe/xe_bo.c]

Link: https://lore.kernel.org/all/20250613-upstream-xe-non-4k-v2-v2-1-934f82249f8a@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
…on 3C6000 series steppings

Older steppings of the Loongson 3C6000 series incorrectly report the
supported link speeds on their PCIe bridges (device IDs 3c19, 3c29) as
only 2.5 GT/s, despite the upstream bus supporting speeds from 2.5 GT/s
up to 16 GT/s.

As a result, certain PCIe devices would be incorrectly probed as a Gen1-
only, even if higher link speeds are supported, harming performance and
prevents dynamic link speed functionality from being enabled in drivers
such as amdgpu.

Manually override the `supported_speeds` field for affected PCIe bridges
with those found on the upstream bus to correctly reflect the supported
link speeds.

This patch is found from AOSC OS[1].

Link: AOSC-Tracking#2 #1
Tested-by: Lain Fearyncess Yang <fsf@live.com>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Ziyao <liziyao@uniontech.com>

Link: https://lore.kernel.org/loongarch/20250822-loongson-pci1-v1-1-39aabbd11fbd@uniontech.com/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
While testing my ROCm port for LoongArch and AArch64 (patches pending) on
the following platforms:

- LoongArch ...
  - Loongson AC612A0_V1.1 (Loongson 3C6000/S) + AMD Radeon RX 6800
- AArch64 ...
  - FD30M51 (Phytium FT-D3000) + AMD Radeon RX 7600
  - Huawei D920S10 (Huawei Kunpeng 920) + AMD Radeon RX 7600

When HSA_AMD_SVM is enabled, amdgpu would fail to initialise at all on
LoongArch (no output):

  amdgpu 0000:0d:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
  CPU 0 Unable to handle kernel paging request at virtual address ffffffffff800034, era == 9000000001058044, ra == 9000000001058660
  Oops[#1]:
  CPU: 0 UID: 0 PID: 202 Comm: kworker/0:3 Not tainted 6.16.0+ torvalds#103 PREEMPT(full)
  Hardware name: To be filled by O.E.M.To be fill To be filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS Loongson-UDK2018-V4.0.
  Workqueue: events work_for_cpu_fn
  pc 9000000001058044 ra 9000000001058660 tp 9000000101500000 sp 9000000101503aa0
  a0 ffffffffff800000 a1 0000000ffffe0000 a2 0000000000000000 a3 90000001207c58e0
  a4 9000000001a4c310 a5 0000000000000001 a6 0000000000000000 a7 0000000000000001
  t0 000003ffff800000 t1 0000000000000001 t2 0000040000000000 t3 03ffff0000002000
  t4 0000000000000000 t5 0001010101010101 t6 ffff800000000000 t7 0001000000000000
  t8 000000000000002f u0 0000000000800000 s9 9000000002026000 s0 90000001207c58e0
  s1 0000000000000001 s2 9000000001935c40 s3 0000001000000000 s4 0000000000000001
  s5 0000000ffffe0000 s6 0000000000000040 s7 0001000000000001 s8 0001000000000000
     ra: 9000000001058660 memmap_init_zone_device+0x120/0x1b0
    ERA: 9000000001058044 __init_zone_device_page.constprop.0+0x4/0x1a0
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 00000004 (PPLV0 +PIE -PWE)
   EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00020000 [PIS] (IS= ECode=2 EsubCode=0)
   BADV: ffffffffff800034
   PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
  Modules linked in: amdgpu(+) vfat fat cfg80211 rfkill 8021q garp stp mrp llc snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic drm_client_lib drm_ttm_helper syscopyarea ttm sysfillrect sysimgblt fb_sys_fops drm_panel_backlight_quirks video drm_exec drm_suballoc_helper amdxcp mfd_core drm_buddy gpu_sched drm_display_helper drm_kms_helper cec snd_hda_intel ipmi_ssif snd_intel_dspcfg snd_hda_codec snd_hda_core acpi_ipmi snd_hwdep snd_pcm fb loongson3_cpufreq lcd igc snd_timer ipmi_si spi_loongson_pci spi_loongson_core snd ipmi_devintf soundcore ipmi_msghandler binfmt_misc fuse drm drm_panel_orientation_quirks backlight dm_mod dax nfnetlink
  Process kworker/0:3 (pid: 202, threadinfo=00000000eb7cd5d6, task=000000004ca22b1b)
  Stack : 0000000000001440 0000000000000000 ffffffffff800000 0000000000000001
          90000000020b5978 9000000101503b38 0000000000000001 0000000000000001
          0000000000000000 90000000020b5978 90000000020b3f48 0000000000001440
          0000000000000000 90000001207c58e0 90000001207c5970 9000000000575e20
          90000000010e2e00 90000000020b3f48 900000000205c238 0000000000000000
          00000000000001d3 90000001207c58e0 9000000001958f28 9000000120790848
          90000001207b3510 0000000000000000 9000000120780000 9000000120780010
          90000001207d6000 90000001207c58e0 90000001015660c8 9000000120780000
          0000000000000000 90000000005763a8 90000001207c58e0 00000003ff000000
          9000000120780000 ffff80000296b820 900000012078f968 90000001207c6000
          ...
  Call Trace:
  [<9000000001058044>] __init_zone_device_page.constprop.0+0x4/0x1a0
  [<900000000105865c>] memmap_init_zone_device+0x11c/0x1b0
  [<9000000000575e1c>] memremap_pages+0x24c/0x7b0
  [<90000000005763a4>] devm_memremap_pages+0x24/0x80
  [<ffff80000296b81c>] kgd2kfd_init_zone_device+0x11c/0x220 [amdgpu]
  [<ffff80000265d09c>] amdgpu_device_init+0x27dc/0x2bf0 [amdgpu]
  [<ffff80000265ece8>] amdgpu_driver_load_kms+0x18/0x90 [amdgpu]
  [<ffff800002651fbc>] amdgpu_pci_probe+0x22c/0x890 [amdgpu]
  [<9000000000916adc>] local_pci_probe+0x3c/0xb0
  [<90000000002976c8>] work_for_cpu_fn+0x18/0x30
  [<900000000029aeb4>] process_one_work+0x164/0x320
  [<900000000029b96c>] worker_thread+0x37c/0x4a0
  [<90000000002a695c>] kthread+0x12c/0x220
  [<9000000001055b64>] ret_from_kernel_thread+0x24/0xc0
  [<9000000000237524>] ret_from_kernel_thread_asm+0xc/0x88

  Code: 00000000  00000000  0280040d <2980d08d> 02bffc0e  2980c08e  02c0208d  29c0208d  1400004f

  ---[ end trace 0000000000000000 ]---

Or lock up and/or driver reset during computate tasks, such as when
running llama.cpp over ROCm, at which point the compute process must be
killed before the reset could complete:

  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1202
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 3
  amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1004
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 2
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 1
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 0
  amdgpu: Failed to quiesce KFD
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume

Disabling the aforementioned option makes the issue go away, though it is
unclear whether this is a platform-specific issue or one that lies within
the amdkfd code.

This patch has been tested on all the aforementioned platform
combinations, and sent as an RFC to encourage discussion.

Signed-off-by: Zhang Yuhao <xinmu@xinmu.moe>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Mingcong Bai <jeffbai@aosc.io>

Link: https://lore.kernel.org/all/20250814032153.227285-1-jeffbai@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Oct 19, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

Bug: https://lore.kernel.org/all/b8da4aec-4cca-4eb0-ba87-5f8641aa2ca9@leemhuis.info/
Signed-off-by: Kexy Biscuit <kexybiscuit@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
The mlx5_irq_alloc() function can inadvertently free the entire rmap
and end up in a crash[1] when the other threads tries to access this,
when request_irq() fails due to exhausted IRQ vectors. This commit
modifies the cleanup to remove only the specific IRQ mapping that was
just added.

This prevents removal of other valid mappings and ensures precise
cleanup of the failed IRQ allocation's associated glue object.

Note: This error is observed when both fwctl and rds configs are enabled.

[1]
mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
general protection fault, probably for non-canonical address
0xe277a58fde16f291: 0000 [#1] SMP NOPTI

RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Call Trace:
   <TASK>
   ? show_trace_log_lvl+0x1d6/0x2f9
   ? show_trace_log_lvl+0x1d6/0x2f9
   ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
   ? __die_body.cold+0x8/0xa
   ? die_addr+0x39/0x53
   ? exc_general_protection+0x1c4/0x3e9
   ? dev_vprintk_emit+0x5f/0x90
   ? asm_exc_general_protection+0x22/0x27
   ? free_irq_cpu_rmap+0x23/0x7d
   mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
   irq_pool_request_vector+0x7d/0x90 [mlx5_core]
   mlx5_irq_request+0x2e/0xe0 [mlx5_core]
   mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
   comp_irq_request_pci+0x64/0xf0 [mlx5_core]
   create_comp_eq+0x71/0x385 [mlx5_core]
   ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
   mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
   ? xas_load+0x8/0x91
   mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
   mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
   mlx5e_open_channels+0xad/0x250 [mlx5_core]
   mlx5e_open_locked+0x3e/0x110 [mlx5_core]
   mlx5e_open+0x23/0x70 [mlx5_core]
   __dev_open+0xf1/0x1a5
   __dev_change_flags+0x1e1/0x249
   dev_change_flags+0x21/0x5c
   do_setlink+0x28b/0xcc4
   ? __nla_parse+0x22/0x3d
   ? inet6_validate_link_af+0x6b/0x108
   ? cpumask_next+0x1f/0x35
   ? __snmp6_fill_stats64.constprop.0+0x66/0x107
   ? __nla_validate_parse+0x48/0x1e6
   __rtnl_newlink+0x5ff/0xa57
   ? kmem_cache_alloc_trace+0x164/0x2ce
   rtnl_newlink+0x44/0x6e
   rtnetlink_rcv_msg+0x2bb/0x362
   ? __netlink_sendskb+0x4c/0x6c
   ? netlink_unicast+0x28f/0x2ce
   ? rtnl_calcit.isra.0+0x150/0x146
   netlink_rcv_skb+0x5f/0x112
   netlink_unicast+0x213/0x2ce
   netlink_sendmsg+0x24f/0x4d9
   __sock_sendmsg+0x65/0x6a
   ____sys_sendmsg+0x28f/0x2c9
   ? import_iovec+0x17/0x2b
   ___sys_sendmsg+0x97/0xe0
   __sys_sendmsg+0x81/0xd8
   do_syscall_64+0x35/0x87
   entry_SYSCALL_64_after_hwframe+0x6e/0x0
RIP: 0033:0x7fc328603727
Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
   </TASK>
---[ end trace f43ce73c3c2b13a2 ]---
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
FS:  00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
kvm-guest: disable async PF for cpu 0

Fixes: 3354822 ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
The kernel test has reported:

  BUG: unable to handle page fault for address: fffba000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  *pde = 03171067 *pte = 00000000
  Oops: Oops: 0002 [#1]
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G                T   6.18.0-rc2-00031-gec7f31b2a2d3 #1 NONE  a1d066dfe789f54bc7645c7989957d2bdee593ca
  Tainted: [T]=RANDSTRUCT
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  EIP: memset (arch/x86/include/asm/string_32.h:168 arch/x86/lib/memcpy_32.c:17)
  Code: a5 8b 4d f4 83 e1 03 74 02 f3 a4 83 c4 04 5e 5f 5d 2e e9 73 41 01 00 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 89 d0 89 f7 <f3> aa 89 f0 5e 5f 5d 2e e9 53 41 01 00 cc cc cc 55 89 e5 53 57 56
  EAX: 0000006b EBX: 00000015 ECX: 001fefff EDX: 0000006b
  ESI: fffb9000 EDI: fffba000 EBP: c611fbf0 ESP: c611fbe8
  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010287
  CR0: 80050033 CR2: fffba000 CR3: 0316e000 CR4: 00040690
  Call Trace:
   poison_element (mm/mempool.c:83 mm/mempool.c:102)
   mempool_init_node (mm/mempool.c:142 mm/mempool.c:226)
   mempool_init_noprof (mm/mempool.c:250 (discriminator 1))
   ? mempool_alloc_pages (mm/mempool.c:640)
   bio_integrity_initfn (block/bio-integrity.c:483 (discriminator 8))
   ? mempool_alloc_pages (mm/mempool.c:640)
   do_one_initcall (init/main.c:1283)

Christoph found out this is due to the poisoning code not dealing
properly with CONFIG_HIGHMEM because only the first page is mapped but
then the whole potentially high-order page is accessed.

We could give up on HIGHMEM here, but it's straightforward to fix this
with a loop that's mapping, poisoning or checking and unmapping
individual pages.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511111411.9ebfa1ba-lkp@intel.com
Analyzed-by: Christoph Hellwig <hch@lst.de>
Fixes: bdfedb7 ("mm, mempool: poison elements backed by slab allocator")
Cc: stable@vger.kernel.org
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113-mempool-poison-v1-1-233b3ef984c3@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
nvme_fc_delete_assocation() waits for pending I/O to complete before
returning, and an error can cause ->ioerr_work to be queued after
cancel_work_sync() had been called.  Move the call to cancel_work_sync() to
be after nvme_fc_delete_association() to ensure ->ioerr_work is not running
when the nvme_fc_ctrl object is freed.  Otherwise the following can occur:

[ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL
[ 1135.917705] ------------[ cut here ]------------
[ 1135.922336] kernel BUG at lib/list_debug.c:52!
[ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary)
[ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025
[ 1135.950969] Workqueue:  0x0 (nvme-wq)
[ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b
[ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046
[ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000
[ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0
[ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08
[ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100
[ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0
[ 1136.020677] FS:  0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000
[ 1136.028765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0
[ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1136.055910] PKRU: 55555554
[ 1136.058623] Call Trace:
[ 1136.061074]  <TASK>
[ 1136.063179]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.067540]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.071898]  ? move_linked_works+0x4a/0xa0
[ 1136.075998]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.081744]  ? __die_body.cold+0x8/0x12
[ 1136.085584]  ? die+0x2e/0x50
[ 1136.088469]  ? do_trap+0xca/0x110
[ 1136.091789]  ? do_error_trap+0x65/0x80
[ 1136.095543]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.101289]  ? exc_invalid_op+0x50/0x70
[ 1136.105127]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.110874]  ? asm_exc_invalid_op+0x1a/0x20
[ 1136.115059]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.120806]  move_linked_works+0x4a/0xa0
[ 1136.124733]  worker_thread+0x216/0x3a0
[ 1136.128485]  ? __pfx_worker_thread+0x10/0x10
[ 1136.132758]  kthread+0xfa/0x240
[ 1136.135904]  ? __pfx_kthread+0x10/0x10
[ 1136.139657]  ret_from_fork+0x31/0x50
[ 1136.143236]  ? __pfx_kthread+0x10/0x10
[ 1136.146988]  ret_from_fork_asm+0x1a/0x30
[ 1136.150915]  </TASK>

Fixes: 19fce04 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Cc: stable@vger.kernel.org
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
If the allocation of tl_hba->sh fails in tcm_loop_driver_probe() and we
attempt to dereference it in tcm_loop_tpg_address_show() we will get a
segfault, see below for an example. So, check tl_hba->sh before
dereferencing it.

  Unable to allocate struct scsi_host
  BUG: kernel NULL pointer dereference, address: 0000000000000194
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 8356 Comm: tokio-runtime-w Not tainted 6.6.104.2-4.azl3 #1
  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
  RIP: 0010:tcm_loop_tpg_address_show+0x2e/0x50 [tcm_loop]
...
  Call Trace:
   <TASK>
   configfs_read_iter+0x12d/0x1d0 [configfs]
   vfs_read+0x1b5/0x300
   ksys_read+0x6f/0xf0
...

Cc: stable@vger.kernel.org
Fixes: 2628b35 ("tcm_loop: Show address of tpg in configfs")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Allen Pais <apais@linux.microsoft.com>
Link: https://patch.msgid.link/1762370746-6304-1-git-send-email-hamzamahfooz@linux.microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
This reverts commit d02c2e4.

Mauro reports that this breaks APEI notifications on his QEMU setup
because the "reserved for firmware" region still needs to be writable
by Linux in order to signal _back_ to the firmware after processing
the reported error:

  | {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
  | ...
  | [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
  | Unable to handle kernel write to read-only memory at virtual address ffff800080035018
  | Mem abort info:
  |   ESR = 0x000000009600004f
  |   EC = 0x25: DABT (current EL), IL = 32 bits
  |   SET = 0, FnV = 0
  |   EA = 0, S1PTW = 0
  |   FSC = 0x0f: level 3 permission fault
  | Data abort info:
  |   ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
  |   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
  |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  | swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000
  | pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483
  | Internal error: Oops: 000000009600004f [#1]  SMP

For now, revert the offending commit. We can presumably switch back to
PAGE_KERNEL when bringing this back in the future.

Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
Since commit 30f241f ("xsk: Fix immature cq descriptor
production"), the descriptor number is stored in skb control block and
xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
pool's completion queue.

skb control block shouldn't be used for this purpose as after transmit
xsk doesn't have control over it and other subsystems could use it. This
leads to the following kernel panic due to a NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:xsk_destruct_skb+0xd0/0x180
 [...]
 Call Trace:
  <IRQ>
  ? napi_complete_done+0x7a/0x1a0
  ip_rcv_core+0x1bb/0x340
  ip_rcv+0x30/0x1f0
  __netif_receive_skb_one_core+0x85/0xa0
  process_backlog+0x87/0x130
  __napi_poll+0x28/0x180
  net_rx_action+0x339/0x420
  handle_softirqs+0xdc/0x320
  ? handle_edge_irq+0x90/0x1e0
  do_softirq.part.0+0x3b/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x60/0x70
  __dev_direct_xmit+0x14e/0x1f0
  __xsk_generic_xmit+0x482/0xb70
  ? __remove_hrtimer+0x41/0xa0
  ? __xsk_generic_xmit+0x51/0xb70
  ? _raw_spin_unlock_irqrestore+0xe/0x40
  xsk_sendmsg+0xda/0x1c0
  __sys_sendto+0x1ee/0x200
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x84/0x2f0
  ? __pfx_pollwake+0x10/0x10
  ? __rseq_handle_notify_resume+0xad/0x4c0
  ? restore_fpregs_from_fpstate+0x3c/0x90
  ? switch_fpu_return+0x5b/0xe0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  </TASK>
 [...]
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Instead use the skb destructor_arg pointer along with pointer tagging.
As pointers are always aligned to 8B, use the bottom bit to indicate
whether this a single address or an allocated struct containing several
addresses.

Fixes: 30f241f ("xsk: Fix immature cq descriptor production")
Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
…ptcp_do_fastclose().

syzbot reported divide-by-zero in __tcp_select_window() by
MPTCP socket. [0]

We had a similar issue for the bare TCP and fixed in commit
499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead
of 0").

Let's apply the same fix to mptcp_do_fastclose().

[0]:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336
Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00
RSP: 0018:ffffc90003017640 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028
R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 tcp_select_window net/ipv4/tcp_output.c:281 [inline]
 __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568
 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline]
 tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836
 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793
 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253
 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776
 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe5/0x270 net/socket.c:742
 __sys_sendto+0x3bd/0x520 net/socket.c:2244
 __do_sys_sendto net/socket.c:2251 [inline]
 __se_sys_sendto net/socket.c:2247 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2247
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f66e998f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006
 </TASK>

Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose")
Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
The crash in process_v2_sparse_read() for fscrypt-encrypted directories
has been reported. Issue takes place for Ceph msgr2 protocol in secure
mode. It can be reproduced by the steps:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure

(1) mkdir /mnt/cephfs/fscrypt-test-3
(2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3
(3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3
(4) fscrypt lock /mnt/cephfs/fscrypt-test-3
(5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
(6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar
(7) Issue has been triggered

[  408.072247] ------------[ cut here ]------------
[  408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865
ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072267] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+
[  408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.072310] Workqueue: ceph-msgr ceph_con_workfn
[  408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8
8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06
fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85
[  408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246
[  408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38
[  408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8
[  408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8
[  408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000
[  408.072329] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.072331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.072336] PKRU: 55555554
[  408.072337] Call Trace:
[  408.072338]  <TASK>
[  408.072340]  ? sched_clock_noinstr+0x9/0x10
[  408.072344]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.072347]  ? _raw_spin_unlock+0xe/0x40
[  408.072349]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.072353]  ? __kasan_check_write+0x14/0x30
[  408.072357]  ? mutex_lock+0x84/0xe0
[  408.072359]  ? __pfx_mutex_lock+0x10/0x10
[  408.072361]  ceph_con_workfn+0x27e/0x10e0
[  408.072364]  ? metric_delayed_work+0x311/0x2c50
[  408.072367]  process_one_work+0x611/0xe20
[  408.072371]  ? __kasan_check_write+0x14/0x30
[  408.072373]  worker_thread+0x7e3/0x1580
[  408.072375]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.072378]  ? __pfx_worker_thread+0x10/0x10
[  408.072381]  kthread+0x381/0x7a0
[  408.072383]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.072385]  ? __pfx_kthread+0x10/0x10
[  408.072387]  ? __kasan_check_write+0x14/0x30
[  408.072389]  ? recalc_sigpending+0x160/0x220
[  408.072392]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.072394]  ? calculate_sigpending+0x78/0xb0
[  408.072395]  ? __pfx_kthread+0x10/0x10
[  408.072397]  ret_from_fork+0x2b6/0x380
[  408.072400]  ? __pfx_kthread+0x10/0x10
[  408.072402]  ret_from_fork_asm+0x1a/0x30
[  408.072406]  </TASK>
[  408.072407] ---[ end trace 0000000000000000 ]---
[  408.072418] Oops: general protection fault, probably for non-canonical
address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
[  408.072984] KASAN: null-ptr-deref in range [0x0000000000000000-
0x0000000000000007]
[  408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G        W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  408.073886] Tainted: [W]=WARN
[  408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.074468] Workqueue: ceph-msgr ceph_con_workfn
[  408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.079159] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.079736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.080376] PKRU: 55555554
[  408.080513] Call Trace:
[  408.080630]  <TASK>
[  408.080729]  ceph_con_v2_try_read+0x49b9/0x72f0
[  408.081115]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.081348]  ? _raw_spin_unlock+0xe/0x40
[  408.081538]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.081768]  ? __kasan_check_write+0x14/0x30
[  408.081986]  ? mutex_lock+0x84/0xe0
[  408.082160]  ? __pfx_mutex_lock+0x10/0x10
[  408.082343]  ceph_con_workfn+0x27e/0x10e0
[  408.082529]  ? metric_delayed_work+0x311/0x2c50
[  408.082737]  process_one_work+0x611/0xe20
[  408.082948]  ? __kasan_check_write+0x14/0x30
[  408.083156]  worker_thread+0x7e3/0x1580
[  408.083331]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.083557]  ? __pfx_worker_thread+0x10/0x10
[  408.083751]  kthread+0x381/0x7a0
[  408.083922]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.084139]  ? __pfx_kthread+0x10/0x10
[  408.084310]  ? __kasan_check_write+0x14/0x30
[  408.084510]  ? recalc_sigpending+0x160/0x220
[  408.084708]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.084917]  ? calculate_sigpending+0x78/0xb0
[  408.085138]  ? __pfx_kthread+0x10/0x10
[  408.085335]  ret_from_fork+0x2b6/0x380
[  408.085525]  ? __pfx_kthread+0x10/0x10
[  408.085720]  ret_from_fork_asm+0x1a/0x30
[  408.085922]  </TASK>
[  408.086036] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.087778] ---[ end trace 0000000000000000 ]---
[  408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.091035] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.091452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.092530] PKRU: 55555554
[  417.112915]
==================================================================
[  417.113491] BUG: KASAN: slab-use-after-free in
__mutex_lock.constprop.0+0x1522/0x1610
[  417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951

[  417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G      D W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  417.114592] Tainted: [D]=DIE, [W]=WARN
[  417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  417.114596] Workqueue: events handle_timeout
[  417.114601] Call Trace:
[  417.114602]  <TASK>
[  417.114604]  dump_stack_lvl+0x5c/0x90
[  417.114610]  print_report+0x171/0x4dc
[  417.114613]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114617]  ? kasan_complete_mode_report_info+0x80/0x220
[  417.114621]  kasan_report+0xbd/0x100
[  417.114625]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114628]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114630]  __asan_report_load4_noabort+0x14/0x30
[  417.114633]  __mutex_lock.constprop.0+0x1522/0x1610
[  417.114635]  ? queue_con_delay+0x8d/0x200
[  417.114638]  ? __pfx___mutex_lock.constprop.0+0x10/0x10
[  417.114641]  ? __send_subscribe+0x529/0xb20
[  417.114644]  __mutex_lock_slowpath+0x13/0x20
[  417.114646]  mutex_lock+0xd4/0xe0
[  417.114649]  ? __pfx_mutex_lock+0x10/0x10
[  417.114652]  ? ceph_monc_renew_subs+0x2a/0x40
[  417.114654]  ceph_con_keepalive+0x22/0x110
[  417.114656]  handle_timeout+0x6b3/0x11d0
[  417.114659]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114662]  ? __pfx_handle_timeout+0x10/0x10
[  417.114664]  ? queue_delayed_work_on+0x8e/0xa0
[  417.114669]  process_one_work+0x611/0xe20
[  417.114672]  ? __kasan_check_write+0x14/0x30
[  417.114676]  worker_thread+0x7e3/0x1580
[  417.114678]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114682]  ? __pfx_sched_setscheduler_nocheck+0x10/0x10
[  417.114687]  ? __pfx_worker_thread+0x10/0x10
[  417.114689]  kthread+0x381/0x7a0
[  417.114692]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  417.114694]  ? __pfx_kthread+0x10/0x10
[  417.114697]  ? __kasan_check_write+0x14/0x30
[  417.114699]  ? recalc_sigpending+0x160/0x220
[  417.114703]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114705]  ? calculate_sigpending+0x78/0xb0
[  417.114707]  ? __pfx_kthread+0x10/0x10
[  417.114710]  ret_from_fork+0x2b6/0x380
[  417.114713]  ? __pfx_kthread+0x10/0x10
[  417.114715]  ret_from_fork_asm+0x1a/0x30
[  417.114720]  </TASK>

[  417.125171] Allocated by task 2:
[  417.125333]  kasan_save_stack+0x26/0x60
[  417.125522]  kasan_save_track+0x14/0x40
[  417.125742]  kasan_save_alloc_info+0x39/0x60
[  417.125945]  __kasan_slab_alloc+0x8b/0xb0
[  417.126133]  kmem_cache_alloc_node_noprof+0x13b/0x460
[  417.126381]  copy_process+0x320/0x6250
[  417.126595]  kernel_clone+0xb7/0x840
[  417.126792]  kernel_thread+0xd6/0x120
[  417.126995]  kthreadd+0x85c/0xbe0
[  417.127176]  ret_from_fork+0x2b6/0x380
[  417.127378]  ret_from_fork_asm+0x1a/0x30

[  417.127692] Freed by task 0:
[  417.127851]  kasan_save_stack+0x26/0x60
[  417.128057]  kasan_save_track+0x14/0x40
[  417.128267]  kasan_save_free_info+0x3b/0x60
[  417.128491]  __kasan_slab_free+0x6c/0xa0
[  417.128708]  kmem_cache_free+0x182/0x550
[  417.128906]  free_task+0xeb/0x140
[  417.129070]  __put_task_struct+0x1d2/0x4f0
[  417.129259]  __put_task_struct_rcu_cb+0x15/0x20
[  417.129480]  rcu_do_batch+0x3d3/0xe70
[  417.129681]  rcu_core+0x549/0xb30
[  417.129839]  rcu_core_si+0xe/0x20
[  417.130005]  handle_softirqs+0x160/0x570
[  417.130190]  __irq_exit_rcu+0x189/0x1e0
[  417.130369]  irq_exit_rcu+0xe/0x20
[  417.130531]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.130768]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.131082] Last potentially related work creation:
[  417.131305]  kasan_save_stack+0x26/0x60
[  417.131484]  kasan_record_aux_stack+0xae/0xd0
[  417.131695]  __call_rcu_common+0xcd/0x14b0
[  417.131909]  call_rcu+0x31/0x50
[  417.132071]  delayed_put_task_struct+0x128/0x190
[  417.132295]  rcu_do_batch+0x3d3/0xe70
[  417.132478]  rcu_core+0x549/0xb30
[  417.132658]  rcu_core_si+0xe/0x20
[  417.132808]  handle_softirqs+0x160/0x570
[  417.132993]  __irq_exit_rcu+0x189/0x1e0
[  417.133181]  irq_exit_rcu+0xe/0x20
[  417.133353]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.133584]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.133921] Second to last potentially related work creation:
[  417.134183]  kasan_save_stack+0x26/0x60
[  417.134362]  kasan_record_aux_stack+0xae/0xd0
[  417.134566]  __call_rcu_common+0xcd/0x14b0
[  417.134782]  call_rcu+0x31/0x50
[  417.134929]  put_task_struct_rcu_user+0x58/0xb0
[  417.135143]  finish_task_switch.isra.0+0x5d3/0x830
[  417.135366]  __schedule+0xd30/0x5100
[  417.135534]  schedule_idle+0x5a/0x90
[  417.135712]  do_idle+0x25f/0x410
[  417.135871]  cpu_startup_entry+0x53/0x70
[  417.136053]  start_secondary+0x216/0x2c0
[  417.136233]  common_startup_64+0x13e/0x141

[  417.136894] The buggy address belongs to the object at ffff888124870000
                which belongs to the cache task_struct of size 10504
[  417.138122] The buggy address is located 52 bytes inside of
                freed 10504-byte region [ffff888124870000, ffff888124872908)

[  417.139465] The buggy address belongs to the physical page:
[  417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
pfn:0x124870
[  417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
pincount:0
[  417.141519] memcg:ffff88811aa20e01
[  417.141874] anon flags:
0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  417.142600] page_type: f5(slab)
[  417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.144329] head: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff
00000000ffffffff
[  417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff
0000000000000008
[  417.145485] page dumped because: kasan: bad access detected

[  417.145859] Memory state around the buggy address:
[  417.146094]  ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146439]  ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147145]                                      ^
[  417.147387]  ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147751]  ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.148123]
==================================================================

First of all, we have warning in get_bvec_at() because
cursor->total_resid contains zero value. And, finally,
we have crash in ceph_msg_data_advance() because
cursor->data is NULL. It means that get_bvec_at()
receives not initialized ceph_msg_data_cursor structure
because data is NULL and total_resid contains zero.

Moreover, we don't have likewise issue for the case of
Ceph msgr1 protocol because ceph_msg_data_cursor_init()
has been called before reading sparse data.

This patch adds calling of ceph_msg_data_cursor_init()
in the beginning of process_v2_sparse_read() with
the goal to guarantee that logic of reading sparse data
works correctly for the case of Ceph msgr2 protocol.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/73152
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
[WHAT]
IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic
fails with NULL pointer dereference. This can be reproduced with
both an eDP panel and a DP monitors connected.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted
6.16.0-99-custom torvalds#8 PREEMPT(voluntary)
 Hardware name: AMD ........
 RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu]
 Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49
 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30
 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02
 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668
 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000
 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760
 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000
 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c
 FS:  000071f631b68700(0000) GS:ffff8b399f114000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
 <TASK>
 dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu]
 amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu]
 ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu]
 amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu]
 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400
 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30
 drm_crtc_get_last_vbltimestamp+0x55/0x90
 drm_crtc_next_vblank_start+0x45/0xa0
 drm_atomic_helper_wait_for_fences+0x81/0x1f0
 ...

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 621e55f)
Cc: stable@vger.kernel.org
RevySR pushed a commit that referenced this pull request Dec 23, 2025
A synchronous external abort occurs on the Renesas RZ/G3S SoC if unbind is
executed after the configuration sequence described above:

modprobe usb_f_ecm
modprobe libcomposite
modprobe configfs
cd /sys/kernel/config/usb_gadget
mkdir -p g1
cd g1
echo "0x1d6b" > idVendor
echo "0x0104" > idProduct
mkdir -p strings/0x409
echo "0123456789" > strings/0x409/serialnumber
echo "Renesas." > strings/0x409/manufacturer
echo "Ethernet Gadget" > strings/0x409/product
mkdir -p functions/ecm.usb0
mkdir -p configs/c.1
mkdir -p configs/c.1/strings/0x409
echo "ECM" > configs/c.1/strings/0x409/configuration

if [ ! -L configs/c.1/ecm.usb0 ]; then
        ln -s functions/ecm.usb0 configs/c.1
fi

echo 11e20000.usb > UDC
echo 11e20000.usb > /sys/bus/platform/drivers/renesas_usbhs/unbind

The displayed trace is as follows:

 Internal error: synchronous external abort: 0000000096000010 [#1] SMP
 CPU: 0 UID: 0 PID: 188 Comm: sh Tainted: G M 6.17.0-rc7-next-20250922-00010-g41050493b2bd torvalds#55 PREEMPT
 Tainted: [M]=MACHINE_CHECK
 Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
 pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs]
 lr : usbhsg_update_pullup+0x3c/0x68 [renesas_usbhs]
 sp : ffff8000838b3920
 x29: ffff8000838b3920 x28: ffff00000d585780 x27: 0000000000000000
 x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000c3e3810
 x23: ffff00000d5e5c80 x22: ffff00000d5e5d40 x21: 0000000000000000
 x20: 0000000000000000 x19: ffff00000d5e5c80 x18: 0000000000000020
 x17: 2e30303230316531 x16: 312d7968703a7968 x15: 3d454d414e5f4344
 x14: 000000000000002c x13: 0000000000000000 x12: 0000000000000000
 x11: ffff00000f358f38 x10: ffff00000f358db0 x9 : ffff00000b41f418
 x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
 x5 : 8080808000000000 x4 : 000000004b5ccb9d x3 : 0000000000000000
 x2 : 0000000000000000 x1 : ffff800083790000 x0 : ffff00000d5e5c80
 Call trace:
 usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs] (P)
 usbhsg_pullup+0x4c/0x7c [renesas_usbhs]
 usb_gadget_disconnect_locked+0x48/0xd4
 gadget_unbind_driver+0x44/0x114
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1c8/0x224
 device_release_driver+0x18/0x24
 bus_remove_device+0xcc/0x10c
 device_del+0x14c/0x404
 usb_del_gadget+0x88/0xc0
 usb_del_gadget_udc+0x18/0x30
 usbhs_mod_gadget_remove+0x24/0x44 [renesas_usbhs]
 usbhs_mod_remove+0x20/0x30 [renesas_usbhs]
 usbhs_remove+0x98/0xdc [renesas_usbhs]
 platform_remove+0x20/0x30
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1c8/0x224
 device_driver_detach+0x18/0x24
 unbind_store+0xb4/0xb8
 drv_attr_store+0x24/0x38
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x128/0x1b8
 vfs_write+0x2ac/0x350
 ksys_write+0x68/0xfc
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0xf0
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c
 Code: 7100003f 1a9f07e1 531c6c22 f9400001 (79400021)
 ---[ end trace 0000000000000000 ]---
 note: sh[188] exited with irqs disabled
 note: sh[188] exited with preempt_count 1

The issue occurs because usbhs_sys_function_pullup(), which accesses the IP
registers, is executed after the USBHS clocks have been disabled. The
problem is reproducible on the Renesas RZ/G3S SoC starting with the
addition of module stop in the clock enable/disable APIs. With module stop
functionality enabled, a bus error is expected if a master accesses a
module whose clock has been stopped and module stop activated.

Disable the IP clocks at the end of remove.

Cc: stable <stable@kernel.org>
Fixes: f1407d5 ("usb: renesas_usbhs: Add Renesas USBHS common code")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251027140741.557198-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
…em corrupted

commit 986835b upstream.

There's issue when file system corrupted:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1289!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next
RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0
RSP: 0018:ffff888117aafa30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534
RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010
RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __ext4_journal_get_create_access+0x42/0x170
 ext4_getblk+0x319/0x6f0
 ext4_bread+0x11/0x100
 ext4_append+0x1e6/0x4a0
 ext4_init_new_dir+0x145/0x1d0
 ext4_mkdir+0x326/0x920
 vfs_mkdir+0x45c/0x740
 do_mkdirat+0x234/0x2f0
 __x64_sys_mkdir+0xd6/0x120
 do_syscall_64+0x5f/0xfa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above issue occurs with us in errors=continue mode when accompanied by
storage failures. There have been many inconsistencies in the file system
data.
In the case of file system data inconsistency, for example, if the block
bitmap of a referenced block is not set, it can lead to the situation where
a block being committed is allocated and used again. As a result, the
following condition will not be satisfied then trigger BUG_ON. Of course,
it is entirely possible to construct a problematic image that can trigger
this BUG_ON through specific operations. In fact, I have constructed such
an image and easily reproduced this issue.
Therefore, J_ASSERT() holds true only under ideal conditions, but it may
not necessarily be satisfied in exceptional scenarios. Using J_ASSERT()
directly in abnormal situations would cause the system to crash, which is
clearly not what we want. So here we directly trigger a JBD abort instead
of immediately invoking BUG_ON.

Fixes: 470decc ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
commit 0cd8fee upstream.

Fix a race between inline data destruction and block mapping.

The function ext4_destroy_inline_data_nolock() changes the inode data
layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS.
At the same time, another thread may execute ext4_map_blocks(), which
tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks()
or ext4_ind_map_blocks().

Without i_data_sem protection, ext4_ind_map_blocks() may receive inode
with EXT4_INODE_EXTENTS flag and triggering assert.

kernel BUG at fs/ext4/indirect.c:546!
EXT4-fs (loop2): unmounting filesystem.
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546

Call Trace:
 <TASK>
 ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681
 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822
 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124
 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255
 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000
 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846
 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285
 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679
 call_write_iter include/linux/fs.h:2271 [inline]
 do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735
 do_iter_write+0x186/0x710 fs/read_write.c:861
 vfs_iter_write+0x70/0xa0 fs/read_write.c:902
 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685
 do_splice_from fs/splice.c:763 [inline]
 direct_splice_actor+0x10f/0x170 fs/splice.c:950
 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896
 do_splice_direct+0x1a9/0x280 fs/splice.c:1002
 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255
 __do_sys_sendfile64 fs/read_write.c:1323 [inline]
 __se_sys_sendfile64 fs/read_write.c:1309 [inline]
 __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: c755e25 ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Message-ID: <20251104093326.697381-1-sdl@nppct.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
commit 3e0ae02 upstream.

Rust Binder contains the following unsafe operation:

	// SAFETY: A `NodeDeath` is never inserted into the death list
	// of any node other than its owner, so it is either in this
	// death list or in no death list.
	unsafe { node_inner.death_list.remove(self) };

This operation is unsafe because when touching the prev/next pointers of
a list element, we have to ensure that no other thread is also touching
them in parallel. If the node is present in the list that `remove` is
called on, then that is fine because we have exclusive access to that
list. If the node is not in any list, then it's also ok. But if it's
present in a different list that may be accessed in parallel, then that
may be a data race on the prev/next pointers.

And unfortunately that is exactly what is happening here. In
Node::release, we:

 1. Take the lock.
 2. Move all items to a local list on the stack.
 3. Drop the lock.
 4. Iterate the local list on the stack.

Combined with threads using the unsafe remove method on the original
list, this leads to memory corruption of the prev/next pointers. This
leads to crashes like this one:

	Unable to handle kernel paging request at virtual address 000bb9841bcac70e
	Mem abort info:
	  ESR = 0x0000000096000044
	  EC = 0x25: DABT (current EL), IL = 32 bits
	  SET = 0, FnV = 0
	  EA = 0, S1PTW = 0
	  FSC = 0x04: level 0 translation fault
	Data abort info:
	  ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
	  CM = 0, WnR = 1, TnD = 0, TagAccess = 0
	  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
	[000bb9841bcac70e] address between user and kernel address ranges
	Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
	google-cdd 538c004.gcdd: context saved(CPU:1)
	item - log_kevents is disabled
	Modules linked in: ... rust_binder
	CPU: 1 UID: 0 PID: 2092 Comm: kworker/1:178 Tainted: G S      W  OE      6.12.52-android16-5-g98debd5df505-4k #1 f94a6367396c5488d635708e43ee0c888d230b0b
	Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
	Hardware name: MUSTANG PVT 1.0 based on LGA (DT)
	Workqueue: events _RNvXs6_NtCsdfZWD8DztAw_6kernel9workqueueINtNtNtB7_4sync3arc3ArcNtNtCs8QPsHWIn21X_16rust_binder_main7process7ProcessEINtB5_15WorkItemPointerKy0_E3runB13_ [rust_binder]
	pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
	pc : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder]
	lr : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x464/0x11f8 [rust_binder]
	sp : ffffffc09b433ac0
	x29: ffffffc09b433d30 x28: ffffff8821690000 x27: ffffffd40cbaa448
	x26: ffffff8821690000 x25: 00000000ffffffff x24: ffffff88d0376578
	x23: 0000000000000001 x22: ffffffc09b433c78 x21: ffffff88e8f9bf40
	x20: ffffff88e8f9bf40 x19: ffffff882692b000 x18: ffffffd40f10bf00
	x17: 00000000c006287d x16: 00000000c006287d x15: 00000000000003b0
	x14: 0000000000000100 x13: 000000201cb79ae0 x12: fffffffffffffff0
	x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000
	x8 : b80bb9841bcac706 x7 : 0000000000000001 x6 : fffffffebee63f30
	x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
	x2 : 0000000000004c31 x1 : ffffff88216900c0 x0 : ffffff88e8f9bf00
	Call trace:
	 _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder bbc172b53665bbc815363b22e97e3f7e3fe971fc]
	 process_scheduled_works+0x1c4/0x45c
	 worker_thread+0x32c/0x3e8
	 kthread+0x11c/0x1c8
	 ret_from_fork+0x10/0x20
	Code: 94218d85 b4000155 a94026a8 d10102a0 (f9000509)
	---[ end trace 0000000000000000 ]---

Thus, modify Node::release to pop items directly off the original list.

Cc: stable@vger.kernel.org
Fixes: eafedbc ("rust_binder: add Rust Binder driver")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20251111-binder-fix-list-remove-v1-1-8ed14a0da63d@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
commit a51f025 upstream.

Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from
the fact that in case of early device detach via pcl818_detach(),
subdevice dev->read_subdev may not have initialized its pointer to
&struct comedi_async as intended. Thus, any such dereferencing of
&s->async->cmd will lead to general protection fault and kernel crash.

Mitigate this problem by removing a call to pcl818_ai_cancel() from
pcl818_detach() altogether. This way, if the subdevice setups its
support for async commands, everything async-related will be
handled via subdevice's own ->cancel() function in
comedi_device_detach_locked() even before pcl818_detach(). If no
support for asynchronous commands is provided, there is no need
to cancel anything either.

[1] Syzbot crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762
...
Call Trace:
 <TASK>
 pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115
 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207
 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline]
 comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
...

Reported-by: syzbot+fce5d9d5bd067d6fbe9b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b
Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data")
Cc: stable <stable@kernel.org>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://patch.msgid.link/20251023141457.398685-1-n.zhandarovich@fintech.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
…sizes

The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: <stable@vger.kernel.org>
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Haien Liang <27873200@qq.com>
Tested-by: Shirong Liu <lsr1024@qq.com>
Tested-by: Haofeng Wu <s2600cw2@126.com>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

[Mingcong Bai: Resolved a minor merge conflict post-6.16 in
 drivers/gpu/drm/xe/xe_bo.c]

Link: https://lore.kernel.org/all/20250613-upstream-xe-non-4k-v2-v2-1-934f82249f8a@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
While testing my ROCm port for LoongArch and AArch64 (patches pending) on
the following platforms:

- LoongArch ...
  - Loongson AC612A0_V1.1 (Loongson 3C6000/S) + AMD Radeon RX 6800
- AArch64 ...
  - FD30M51 (Phytium FT-D3000) + AMD Radeon RX 7600
  - Huawei D920S10 (Huawei Kunpeng 920) + AMD Radeon RX 7600

When HSA_AMD_SVM is enabled, amdgpu would fail to initialise at all on
LoongArch (no output):

  amdgpu 0000:0d:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
  CPU 0 Unable to handle kernel paging request at virtual address ffffffffff800034, era == 9000000001058044, ra == 9000000001058660
  Oops[#1]:
  CPU: 0 UID: 0 PID: 202 Comm: kworker/0:3 Not tainted 6.16.0+ torvalds#103 PREEMPT(full)
  Hardware name: To be filled by O.E.M.To be fill To be filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS Loongson-UDK2018-V4.0.
  Workqueue: events work_for_cpu_fn
  pc 9000000001058044 ra 9000000001058660 tp 9000000101500000 sp 9000000101503aa0
  a0 ffffffffff800000 a1 0000000ffffe0000 a2 0000000000000000 a3 90000001207c58e0
  a4 9000000001a4c310 a5 0000000000000001 a6 0000000000000000 a7 0000000000000001
  t0 000003ffff800000 t1 0000000000000001 t2 0000040000000000 t3 03ffff0000002000
  t4 0000000000000000 t5 0001010101010101 t6 ffff800000000000 t7 0001000000000000
  t8 000000000000002f u0 0000000000800000 s9 9000000002026000 s0 90000001207c58e0
  s1 0000000000000001 s2 9000000001935c40 s3 0000001000000000 s4 0000000000000001
  s5 0000000ffffe0000 s6 0000000000000040 s7 0001000000000001 s8 0001000000000000
     ra: 9000000001058660 memmap_init_zone_device+0x120/0x1b0
    ERA: 9000000001058044 __init_zone_device_page.constprop.0+0x4/0x1a0
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 00000004 (PPLV0 +PIE -PWE)
   EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00020000 [PIS] (IS= ECode=2 EsubCode=0)
   BADV: ffffffffff800034
   PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
  Modules linked in: amdgpu(+) vfat fat cfg80211 rfkill 8021q garp stp mrp llc snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic drm_client_lib drm_ttm_helper syscopyarea ttm sysfillrect sysimgblt fb_sys_fops drm_panel_backlight_quirks video drm_exec drm_suballoc_helper amdxcp mfd_core drm_buddy gpu_sched drm_display_helper drm_kms_helper cec snd_hda_intel ipmi_ssif snd_intel_dspcfg snd_hda_codec snd_hda_core acpi_ipmi snd_hwdep snd_pcm fb loongson3_cpufreq lcd igc snd_timer ipmi_si spi_loongson_pci spi_loongson_core snd ipmi_devintf soundcore ipmi_msghandler binfmt_misc fuse drm drm_panel_orientation_quirks backlight dm_mod dax nfnetlink
  Process kworker/0:3 (pid: 202, threadinfo=00000000eb7cd5d6, task=000000004ca22b1b)
  Stack : 0000000000001440 0000000000000000 ffffffffff800000 0000000000000001
          90000000020b5978 9000000101503b38 0000000000000001 0000000000000001
          0000000000000000 90000000020b5978 90000000020b3f48 0000000000001440
          0000000000000000 90000001207c58e0 90000001207c5970 9000000000575e20
          90000000010e2e00 90000000020b3f48 900000000205c238 0000000000000000
          00000000000001d3 90000001207c58e0 9000000001958f28 9000000120790848
          90000001207b3510 0000000000000000 9000000120780000 9000000120780010
          90000001207d6000 90000001207c58e0 90000001015660c8 9000000120780000
          0000000000000000 90000000005763a8 90000001207c58e0 00000003ff000000
          9000000120780000 ffff80000296b820 900000012078f968 90000001207c6000
          ...
  Call Trace:
  [<9000000001058044>] __init_zone_device_page.constprop.0+0x4/0x1a0
  [<900000000105865c>] memmap_init_zone_device+0x11c/0x1b0
  [<9000000000575e1c>] memremap_pages+0x24c/0x7b0
  [<90000000005763a4>] devm_memremap_pages+0x24/0x80
  [<ffff80000296b81c>] kgd2kfd_init_zone_device+0x11c/0x220 [amdgpu]
  [<ffff80000265d09c>] amdgpu_device_init+0x27dc/0x2bf0 [amdgpu]
  [<ffff80000265ece8>] amdgpu_driver_load_kms+0x18/0x90 [amdgpu]
  [<ffff800002651fbc>] amdgpu_pci_probe+0x22c/0x890 [amdgpu]
  [<9000000000916adc>] local_pci_probe+0x3c/0xb0
  [<90000000002976c8>] work_for_cpu_fn+0x18/0x30
  [<900000000029aeb4>] process_one_work+0x164/0x320
  [<900000000029b96c>] worker_thread+0x37c/0x4a0
  [<90000000002a695c>] kthread+0x12c/0x220
  [<9000000001055b64>] ret_from_kernel_thread+0x24/0xc0
  [<9000000000237524>] ret_from_kernel_thread_asm+0xc/0x88

  Code: 00000000  00000000  0280040d <2980d08d> 02bffc0e  2980c08e  02c0208d  29c0208d  1400004f

  ---[ end trace 0000000000000000 ]---

Or lock up and/or driver reset during computate tasks, such as when
running llama.cpp over ROCm, at which point the compute process must be
killed before the reset could complete:

  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1202
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 3
  amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1004
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 2
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 1
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 0
  amdgpu: Failed to quiesce KFD
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume

Disabling the aforementioned option makes the issue go away, though it is
unclear whether this is a platform-specific issue or one that lies within
the amdkfd code.

This patch has been tested on all the aforementioned platform
combinations, and sent as an RFC to encourage discussion.

Signed-off-by: Zhang Yuhao <xinmu@xinmu.moe>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Mingcong Bai <jeffbai@aosc.io>

Link: https://lore.kernel.org/all/20250814032153.227285-1-jeffbai@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

Bug: https://lore.kernel.org/all/b8da4aec-4cca-4eb0-ba87-5f8641aa2ca9@leemhuis.info/
Signed-off-by: Kexy Biscuit <kexybiscuit@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 23, 2025
… 3C6000 series steppings

Older steppings of the Loongson 3C6000 series incorrectly report the
supported link speeds on their PCIe bridges (device IDs 3c19, 3c29) as
only 2.5 GT/s, despite the upstream bus supporting speeds from 2.5 GT/s
up to 16 GT/s.

As a result, certain PCIe devices would be incorrectly probed as a Gen1-
only, even if higher link speeds are supported, harming performance and
prevents dynamic link speed functionality from being enabled in drivers
such as amdgpu.

Manually override the `supported_speeds` field for affected PCIe bridges
with those found on the upstream bus to correctly reflect the supported
link speeds.

This patch was originally found from AOSC OS[1].

Link: AOSC-Tracking#2 #1
Tested-by: Lain Fearyncess Yang <fsf@live.com>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Ziyao <liziyao@uniontech.com>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
[Xi Ruoyao: Fix falling through logic and add kernel log output.]
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit 78b4d64 ]

A timer that expires a vgem fence automatically in 10 seconds is now
released with timer_delete_sync() from fence->ops.release() called on last
dma_fence_put().  In some scenarios, it can run in IRQ context, which is
not safe unless TIMER_IRQSAFE is used.  One potentially risky scenario was
demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while
working on new IGT subtests syncobj_timeline@stress-* as user space
replacements of some problematic test cases of a dma-fence-chain selftest
[1].

[117.004338] ================================
[117.004340] WARNING: inconsistent lock state
[117.004342] 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 Tainted: G S   U
[117.004346] --------------------------------
[117.004347] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[117.004349] swapper/0/0 [HC1[1]:SC1[1]:HE0:SE0] takes:
[117.004352] ffff888138f86aa8 ((&fence->timer)){?.-.}-{0:0}, at: __timer_delete_sync+0x4b/0x190
[117.004361] {HARDIRQ-ON-W} state was registered at:
[117.004363]   lock_acquire+0xc4/0x2e0
[117.004366]   call_timer_fn+0x80/0x2a0
[117.004368]   __run_timers+0x231/0x310
[117.004370]   run_timer_softirq+0x76/0xe0
[117.004372]   handle_softirqs+0xd4/0x4d0
[117.004375]   __irq_exit_rcu+0x13f/0x160
[117.004377]   irq_exit_rcu+0xe/0x20
[117.004379]   sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004382]   asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004385]   cpuidle_enter_state+0x12b/0x8a0
[117.004388]   cpuidle_enter+0x2e/0x50
[117.004393]   call_cpuidle+0x22/0x60
[117.004395]   do_idle+0x1fd/0x260
[117.004398]   cpu_startup_entry+0x29/0x30
[117.004401]   start_secondary+0x12d/0x160
[117.004404]   common_startup_64+0x13e/0x141
[117.004407] irq event stamp: 2282669
[117.004409] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.004414] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.004419] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.004423] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.004426]
other info that might help us debug this:
[117.004429]  Possible unsafe locking scenario:
[117.004432]        CPU0
[117.004433]        ----
[117.004434]   lock((&fence->timer));
[117.004436]   <Interrupt>
[117.004438]     lock((&fence->timer));
[117.004440]
 *** DEADLOCK ***
[117.004443] 1 lock held by swapper/0/0:
[117.004445]  #0: ffffc90000003d50 ((&fence->timer)){?.-.}-{0:0}, at: call_timer_fn+0x7a/0x2a0
[117.004450]
stack backtrace:
[117.004453] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004455] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004455] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004456] Call Trace:
[117.004456]  <IRQ>
[117.004457]  dump_stack_lvl+0x91/0xf0
[117.004460]  dump_stack+0x10/0x20
[117.004461]  print_usage_bug.part.0+0x260/0x360
[117.004463]  mark_lock+0x76e/0x9c0
[117.004465]  ? register_lock_class+0x48/0x4a0
[117.004467]  __lock_acquire+0xbc3/0x2860
[117.004469]  lock_acquire+0xc4/0x2e0
[117.004470]  ? __timer_delete_sync+0x4b/0x190
[117.004472]  ? __timer_delete_sync+0x4b/0x190
[117.004473]  __timer_delete_sync+0x68/0x190
[117.004474]  ? __timer_delete_sync+0x4b/0x190
[117.004475]  timer_delete_sync+0x10/0x20
[117.004476]  vgem_fence_release+0x19/0x30 [vgem]
[117.004478]  dma_fence_release+0xc1/0x3b0
[117.004480]  ? dma_fence_release+0xa1/0x3b0
[117.004481]  dma_fence_chain_release+0xe7/0x130
[117.004483]  dma_fence_release+0xc1/0x3b0
[117.004484]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004485]  dma_fence_chain_irq_work+0x59/0x80
[117.004487]  irq_work_single+0x75/0xa0
[117.004490]  irq_work_run_list+0x33/0x60
[117.004491]  irq_work_run+0x18/0x40
[117.004493]  __sysvec_irq_work+0x35/0x170
[117.004494]  sysvec_irq_work+0x47/0xc0
[117.004496]  asm_sysvec_irq_work+0x1b/0x20
[117.004497] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004499] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004499] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004500] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004501] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004502] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004502] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004502] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004506]  dma_fence_signal+0x49/0xb0
[117.004507]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004508]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004509]  call_timer_fn+0xa1/0x2a0
[117.004512]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004513]  __run_timers+0x231/0x310
[117.004514]  ? tmigr_handle_remote+0x2ac/0x560
[117.004517]  timer_expire_remote+0x46/0x70
[117.004518]  tmigr_handle_remote+0x433/0x560
[117.004520]  ? __run_timers+0x239/0x310
[117.004521]  ? run_timer_softirq+0x21/0xe0
[117.004522]  ? lock_release+0xce/0x2a0
[117.004524]  run_timer_softirq+0xcf/0xe0
[117.004525]  handle_softirqs+0xd4/0x4d0
[117.004526]  __irq_exit_rcu+0x13f/0x160
[117.004527]  irq_exit_rcu+0xe/0x20
[117.004528]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004529]  </IRQ>
[117.004529]  <TASK>
[117.004529]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004530] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004532] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004532] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004533] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004533] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004534] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004534] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004534] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004537]  ? cpuidle_enter_state+0x125/0x8a0
[117.004538]  ? sched_clock_noinstr+0x9/0x10
[117.004540]  cpuidle_enter+0x2e/0x50
[117.004542]  call_cpuidle+0x22/0x60
[117.004542]  do_idle+0x1fd/0x260
[117.004544]  cpu_startup_entry+0x29/0x30
[117.004546]  rest_init+0x104/0x200
[117.004548]  start_kernel+0x93d/0xbd0
[117.004550]  ? load_ucode_intel_bsp+0x2a/0x90
[117.004551]  ? sme_unmap_bootdata+0x14/0x80
[117.004554]  x86_64_start_reservations+0x18/0x30
[117.004555]  x86_64_start_kernel+0xfd/0x150
[117.004556]  ? soft_restart_cpu+0x14/0x14
[117.004558]  common_startup_64+0x13e/0x141
[117.004560]  </TASK>
[117.004565] ------------[ cut here ]------------
[117.004692] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1610 __timer_delete_sync+0x126/0x190
[117.004697] Modules linked in: vgem snd_hda_codec_intelhdmi snd_hda_codec_hdmi i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit hid_sensor_custom hid_sensor_hub hid_generic intel_ishtp_hid hid intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp cmdlinepart ee1004 r8153_ecm spi_nor coretemp cdc_ether mei_pxp mei_hdcp usbnet mtd intel_rapl_msr wmi_bmof kvm_intel snd_hda_intel snd_intel_dspcfg processor_thermal_device_pci kvm snd_hda_codec processor_thermal_device irqbypass processor_thermal_wt_hint polyval_clmulni platform_temperature_control snd_hda_core ghash_clmulni_intel processor_thermal_rfim spi_pxa2xx_platform snd_hwdep aesni_intel processor_thermal_rapl dw_dmac snd_pcm dw_dmac_core intel_rapl_common r8152 rapl mii intel_cstate spi_pxa2xx_core i2c_i801 processor_thermal_wt_req snd_timer i2c_mux mei_me intel_ish_ipc processor_thermal_power_floor e1000e snd i2c_smbus spi_intel_pci processor_thermal_mbox mei soundcore intel_ishtp thunderbolt idma64
[117.004733]  spi_intel int340x_thermal_zone igen6_edac binfmt_misc intel_skl_int3472_tps68470 intel_pmc_core tps68470_regulator video clk_tps68470 pmt_telemetry pmt_discovery nls_iso8859_1 pmt_class intel_pmc_ssram_telemetry intel_skl_int3472_discrete int3400_thermal intel_hid intel_skl_int3472_common acpi_thermal_rel intel_vsec wmi pinctrl_tigerlake acpi_tad sparse_keymap acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4
[117.004782] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004787] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004789] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004793] RIP: 0010:__timer_delete_sync+0x126/0x190
[117.004795] Code: 31 c0 45 31 c9 c3 cc cc cc cc 48 8b 75 d0 45 84 f6 74 63 49 c7 45 18 00 00 00 00 48 89 c7 e8 51 46 39 01 f3 90 e9 66 ff ff ff <0f> 0b e9 5f ff ff ff e8 ee e4 0c 00 49 8d 5d 28 45 31 c9 31 c9 4c
[117.004801] RSP: 0018:ffffc90000003a40 EFLAGS: 00010046
[117.004804] RAX: ffffffff815093fb RBX: ffff888138f86aa8 RCX: 0000000000000000
[117.004807] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004809] RBP: ffffc90000003a70 R08: 0000000000000000 R09: 0000000000000000
[117.004812] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815093fb
[117.004814] R13: ffff888138f86a80 R14: 0000000000000000 R15: 0000000000000000
[117.004817] FS:  0000000000000000(0000) GS:ffff88890b0f7000(0000) knlGS:0000000000000000
[117.004820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[117.004823] CR2: 00005db8131eb7f0 CR3: 0000000003448000 CR4: 0000000000f52ef0
[117.004826] PKRU: 55555554
[117.004827] Call Trace:
[117.004829]  <IRQ>
[117.004831]  timer_delete_sync+0x10/0x20
[117.004833]  vgem_fence_release+0x19/0x30 [vgem]
[117.004836]  dma_fence_release+0xc1/0x3b0
[117.004838]  ? dma_fence_release+0xa1/0x3b0
[117.004841]  dma_fence_chain_release+0xe7/0x130
[117.004844]  dma_fence_release+0xc1/0x3b0
[117.004847]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004850]  dma_fence_chain_irq_work+0x59/0x80
[117.004853]  irq_work_single+0x75/0xa0
[117.004857]  irq_work_run_list+0x33/0x60
[117.004860]  irq_work_run+0x18/0x40
[117.004863]  __sysvec_irq_work+0x35/0x170
[117.004865]  sysvec_irq_work+0x47/0xc0
[117.004868]  asm_sysvec_irq_work+0x1b/0x20
[117.004871] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004874] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004879] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004882] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004887] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004890] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004892] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004897]  dma_fence_signal+0x49/0xb0
[117.004899]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004902]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004904]  call_timer_fn+0xa1/0x2a0
[117.004908]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004910]  __run_timers+0x231/0x310
[117.004913]  ? tmigr_handle_remote+0x2ac/0x560
[117.004917]  timer_expire_remote+0x46/0x70
[117.004919]  tmigr_handle_remote+0x433/0x560
[117.004923]  ? __run_timers+0x239/0x310
[117.004925]  ? run_timer_softirq+0x21/0xe0
[117.004928]  ? lock_release+0xce/0x2a0
[117.004931]  run_timer_softirq+0xcf/0xe0
[117.004933]  handle_softirqs+0xd4/0x4d0
[117.004936]  __irq_exit_rcu+0x13f/0x160
[117.004938]  irq_exit_rcu+0xe/0x20
[117.004940]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004943]  </IRQ>
[117.004944]  <TASK>
[117.004946]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004949] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004953] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004961] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004963] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004966] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004968] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004971] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004974] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004978]  ? cpuidle_enter_state+0x125/0x8a0
[117.004981]  ? sched_clock_noinstr+0x9/0x10
[117.004985]  cpuidle_enter+0x2e/0x50
[117.004989]  call_cpuidle+0x22/0x60
[117.004991]  do_idle+0x1fd/0x260
[117.005001]  cpu_startup_entry+0x29/0x30
[117.005004]  rest_init+0x104/0x200
[117.005008]  start_kernel+0x93d/0xbd0
[117.005011]  ? load_ucode_intel_bsp+0x2a/0x90
[117.005014]  ? sme_unmap_bootdata+0x14/0x80
[117.005017]  x86_64_start_reservations+0x18/0x30
[117.005020]  x86_64_start_kernel+0xfd/0x150
[117.005023]  ? soft_restart_cpu+0x14/0x14
[117.005026]  common_startup_64+0x13e/0x141
[117.005030]  </TASK>
[117.005032] irq event stamp: 2282669
[117.005034] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.005038] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.005043] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.005047] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.005051] ---[ end trace 0000000000000000 ]---

Make the timer IRQ safe.

[1] https://patchwork.freedesktop.org/series/154987/#rev2

Fixes: 4077798 ("drm/vgem: Attach sw fences to exported vGEM dma-buf (ioctl)")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250926152628.2165080-2-janusz.krzysztofik@linux.intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      torvalds#6 0x458090 in record__synthesize builtin-record.c:2075
      torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
      torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
      torvalds#9 0x4e5e33 in run_builtin perf.c:349
      torvalds#10 0x4e60bf in handle_internal_command perf.c:401
      torvalds#11 0x4e6215 in run_argv perf.c:448
      torvalds#12 0x4e653a in main perf.c:555
      torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      torvalds#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit 23b2d2f ]

When booting with KASAN enabled the following splat is
encountered during probe of the k1 clock driver:

UBSAN: array-index-out-of-bounds in drivers/clk/spacemit/ccu-k1.c:1044:16
index 0 is out of range for type 'clk_hw *[*]'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc5+ #1 PREEMPT(lazy)
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10spacemit 10/01/2022
Call Trace:
[<ffffffff8002b628>] dump_backtrace+0x28/0x38
[<ffffffff800027d2>] show_stack+0x3a/0x50
[<ffffffff800220c2>] dump_stack_lvl+0x5a/0x80
[<ffffffff80022100>] dump_stack+0x18/0x20
[<ffffffff800164b8>] ubsan_epilogue+0x10/0x48
[<ffffffff8099034e>] __ubsan_handle_out_of_bounds+0xa6/0xa8
[<ffffffff80acbfa6>] k1_ccu_probe+0x37e/0x420
[<ffffffff80b79e6e>] platform_probe+0x56/0x98
[<ffffffff80b76a7e>] really_probe+0x9e/0x350
[<ffffffff80b76db0>] __driver_probe_device+0x80/0x138
[<ffffffff80b76f52>] driver_probe_device+0x3a/0xd0
[<ffffffff80b771c4>] __driver_attach+0xac/0x1b8
[<ffffffff80b742fc>] bus_for_each_dev+0x6c/0xc8
[<ffffffff80b76296>] driver_attach+0x26/0x38
[<ffffffff80b759ae>] bus_add_driver+0x13e/0x268
[<ffffffff80b7836a>] driver_register+0x52/0x100
[<ffffffff80b79a78>] __platform_driver_register+0x28/0x38
[<ffffffff814585da>] k1_ccu_driver_init+0x22/0x38
[<ffffffff80023a8a>] do_one_initcall+0x62/0x2a0
[<ffffffff81401c60>] do_initcalls+0x170/0x1a8
[<ffffffff81401e7a>] kernel_init_freeable+0x16a/0x1e0
[<ffffffff811f7534>] kernel_init+0x2c/0x180
[<ffffffff80025f56>] ret_from_fork_kernel+0x16/0x1d8
[<ffffffff81205336>] ret_from_fork_kernel_asm+0x16/0x18
---[ end trace ]---

This is bogus and is simply a result of KASAN consulting the
`.num` member of the struct for bounds information (as it should
due to `__counted_by`) and finding 0 set by kzalloc() because it
has not been initialized before the loop that fills in the array.
The easy fix is to just move the line that sets `num` to before
the loop that fills the array so that KASAN has the information
it needs to accurately conclude that the access is valid.

Fixes: 1b72c59 ("clk: spacemit: Add clock support for SpacemiT K1 SoC")
Tested-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Reviewed-by: Alex Elder <elder@riscstar.com>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit 385aab8 ]

MT7996 driver can use both wed and wed_hif2 devices to offload traffic
from/to the wireless NIC. In the current codebase we assume to always
use the primary wed device in wed callbacks resulting in the following
crash if the hw runs wed_hif2 (e.g. 6GHz link).

[  297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a
[  297.464928] Mem abort info:
[  297.467722]   ESR = 0x0000000096000005
[  297.471461]   EC = 0x25: DABT (current EL), IL = 32 bits
[  297.476766]   SET = 0, FnV = 0
[  297.479809]   EA = 0, S1PTW = 0
[  297.482940]   FSC = 0x05: level 1 translation fault
[  297.487809] Data abort info:
[  297.490679]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  297.496156]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  297.501196]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000
[  297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000
[  297.523532] Internal error: Oops: 0000000096000005 [#1] SMP
[  297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G           O       6.12.50 #0
[  297.723908] Tainted: [O]=OOT_MODULE
[  297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT)
[  297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table]
[  297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.752688] lr : mtk_wed_flow_remove+0x58/0x80
[  297.757126] sp : ffffffc080fe3ae0
[  297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7
[  297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00
[  297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018
[  297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000
[  297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000
[  297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da
[  297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200
[  297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002
[  297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000
[  297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8
[  297.831686] Call trace:
[  297.834123]  mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.839254]  mtk_wed_flow_remove+0x58/0x80
[  297.843342]  mtk_flow_offload_cmd+0x434/0x574
[  297.847689]  mtk_wed_setup_tc_block_cb+0x30/0x40
[  297.852295]  nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table]
[  297.858466]  nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table]
[  297.864463]  process_one_work+0x174/0x300
[  297.868465]  worker_thread+0x278/0x430
[  297.872204]  kthread+0xd8/0xdc
[  297.875251]  ret_from_fork+0x10/0x20
[  297.878820] Code: 928b5ae0 8b000273 91400a6 f943fa61 (79401421)
[  297.884901] ---[ end trace 0000000000000000 ]---

Fix the issue detecting the proper wed reference to use running wed
callabacks.

Fixes: 83eafc9 ("wifi: mt76: mt7996: add wed tx support")
Tested-by: Daniel Pawlik <pawlik.dan@gmail.com>
Tested-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit 1f73a56 ]

Neither sock4 nor sock6 pointers are guaranteed to be non-NULL in
vxlan_xmit_one, e.g. if the iface is brought down. This can lead to the
following NULL dereference:

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  Oops: Oops: 0000 [#1] SMP NOPTI
  RIP: 0010:vxlan_xmit_one+0xbb3/0x1580
  Call Trace:
   vxlan_xmit+0x429/0x610
   dev_hard_start_xmit+0x55/0xa0
   __dev_queue_xmit+0x6d0/0x7f0
   ip_finish_output2+0x24b/0x590
   ip_output+0x63/0x110

Mentioned commits changed the code path in vxlan_xmit_one and as a side
effect the sock4/6 pointer validity checks in vxlan(6)_get_route were
lost. Fix this by adding back checks.

Since both commits being fixed were released in the same version (v6.7)
and are strongly related, bundle the fixes in a single commit.

Reported-by: Liang Li <liali@redhat.com>
Fixes: 6f19b2c ("vxlan: use generic function for tunnel IPv4 route lookup")
Fixes: 2aceb89 ("vxlan: use generic function for tunnel IPv6 route lookup")
Cc: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251126102627.74223-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
…ne() and smb_direct_cm_handler()

[ Upstream commit 425c327 ]

Namjae reported the following:

I have a simple file copy test with windows 11 client, and get the
following error message.

[  894.140312] ------------[ cut here ]------------
[  894.140316] WARNING: CPU: 1 PID: 116 at
fs/smb/server/transport_rdma.c:642 recv_done+0x308/0x360 [ksmbd]
[  894.140335] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils
libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi
snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic
rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel
ib_iser snd_hda_codec intel_uncore_frequency
intel_uncore_frequency_common snd_hda_core intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi
snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel
i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm
snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm
aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi
asus_wmi snd_timer intel_cstate ttm snd drm_client_lib
drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi
wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper
input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp
nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc
ramoops efi_pstore
[  894.140414]  reed_solomon pstore_blk pstore_zone autofs4 btrfs
blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas
usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core
realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd]
[  894.140442] CPU: 1 UID: 0 PID: 116 Comm: kworker/1:1H Tainted: G
    W           6.18.0-rc5+ #1 PREEMPT(voluntary)
[  894.140447] Tainted: [W]=WARN
[  894.140448] Hardware name: System manufacturer System Product
Name/H110M-K, BIOS 3601 12/12/2017
[  894.140450] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  894.140476] RIP: 0010:recv_done+0x308/0x360 [ksmbd]
[  894.140487] Code: 2e f2 ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc
cc cc cc 41 8b 55 10 49 8b 75 08 b9 02 00 00 00 e8 ed f4 f2 c3 e9 59
fd ff ff <0f> 0b e9 02 ff ff ff 49 8b 74 24 28 49 8d 94 24 c8 00 00 00
bf 00
[  894.140490] RSP: 0018:ffffa47ec03f3d78 EFLAGS: 00010293
[  894.140492] RAX: 0000000000000001 RBX: ffff8eb84c818000 RCX: 000000010002ba00
[  894.140494] RDX: 0000000037600001 RSI: 0000000000000083 RDI: ffff8eb92ec9ee40
[  894.140496] RBP: ffffa47ec03f3da0 R08: 0000000000000000 R09: 0000000000000010
[  894.140498] R10: ffff8eb801705680 R11: fefefefefefefeff R12: ffff8eb7454b8810
[  894.140499] R13: ffff8eb746deb988 R14: ffff8eb746deb980 R15: ffff8eb84c818000
[  894.140501] FS:  0000000000000000(0000) GS:ffff8eb9a7355000(0000)
knlGS:0000000000000000
[  894.140503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.140505] CR2: 00002d9401d60018 CR3: 0000000010a40006 CR4: 00000000003726f0
[  894.140507] Call Trace:
[  894.140509]  <TASK>
[  894.140512]  __ib_process_cq+0x8e/0x190 [ib_core]
[  894.140530]  ib_cq_poll_work+0x2f/0x90 [ib_core]
[  894.140545]  process_scheduled_works+0xd4/0x430
[  894.140554]  worker_thread+0x12a/0x270
[  894.140558]  kthread+0x10d/0x250
[  894.140564]  ? __pfx_worker_thread+0x10/0x10
[  894.140567]  ? __pfx_kthread+0x10/0x10
[  894.140571]  ret_from_fork+0x11a/0x160
[  894.140574]  ? __pfx_kthread+0x10/0x10
[  894.140577]  ret_from_fork_asm+0x1a/0x30
[  894.140584]  </TASK>
[  894.140585] ---[ end trace 0000000000000000 ]---
[  894.154363] ------------[ cut here ]------------
[  894.154367] WARNING: CPU: 3 PID: 5543 at
fs/smb/server/transport_rdma.c:1728 smb_direct_cm_handler+0x121/0x130
[ksmbd]
[  894.154384] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils
libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi
snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic
rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel
ib_iser snd_hda_codec intel_uncore_frequency
intel_uncore_frequency_common snd_hda_core intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi
snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel
i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm
snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm
aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi
asus_wmi snd_timer intel_cstate ttm snd drm_client_lib
drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi
wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper
input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp
nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc
ramoops efi_pstore
[  894.154456]  reed_solomon pstore_blk pstore_zone autofs4 btrfs
blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas
usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core
realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd]
[  894.154483] CPU: 3 UID: 0 PID: 5543 Comm: kworker/3:6 Tainted: G
    W           6.18.0-rc5+ #1 PREEMPT(voluntary)
[  894.154487] Tainted: [W]=WARN
[  894.154488] Hardware name: System manufacturer System Product
Name/H110M-K, BIOS 3601 12/12/2017
[  894.154490] Workqueue: ib_cm cm_work_handler [ib_cm]
[  894.154499] RIP: 0010:smb_direct_cm_handler+0x121/0x130 [ksmbd]
[  894.154507] Code: e7 e8 13 b1 ef ff 44 89 e1 4c 89 ee 48 c7 c7 80
d7 59 c1 48 89 c2 e8 2e 4d ef c3 31 c0 5b 41 5c 41 5d 41 5e 5d c3 cc
cc cc cc <0f> 0b eb a5 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
90 90
[  894.154510] RSP: 0018:ffffa47ec1b27c00 EFLAGS: 00010206
[  894.154512] RAX: ffffffffc1304e00 RBX: ffff8eb89ae50880 RCX: 0000000000000000
[  894.154514] RDX: ffff8eb730960000 RSI: ffffa47ec1b27c60 RDI: ffff8eb7454b9400
[  894.154515] RBP: ffffa47ec1b27c20 R08: 0000000000000002 R09: ffff8eb730b8c18b
[  894.154517] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000009
[  894.154518] R13: ffff8eb7454b9400 R14: ffff8eb7454b8810 R15: ffff8eb815c43000
[  894.154520] FS:  0000000000000000(0000) GS:ffff8eb9a7455000(0000)
knlGS:0000000000000000
[  894.154522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.154523] CR2: 00007fe1310e99d0 CR3: 0000000010a40005 CR4: 00000000003726f0
[  894.154525] Call Trace:
[  894.154527]  <TASK>
[  894.154530]  cma_cm_event_handler+0x27/0xd0 [rdma_cm]
[  894.154541]  cma_ib_handler+0x99/0x2e0 [rdma_cm]
[  894.154551]  cm_process_work+0x28/0xf0 [ib_cm]
[  894.154557]  cm_queue_work_unlock+0x41/0xf0 [ib_cm]
[  894.154563]  cm_work_handler+0x2eb/0x25b0 [ib_cm]
[  894.154568]  ? pwq_activate_first_inactive+0x52/0x70
[  894.154572]  ? pwq_dec_nr_in_flight+0x244/0x330
[  894.154575]  process_scheduled_works+0xd4/0x430
[  894.154579]  worker_thread+0x12a/0x270
[  894.154581]  kthread+0x10d/0x250
[  894.154585]  ? __pfx_worker_thread+0x10/0x10
[  894.154587]  ? __pfx_kthread+0x10/0x10
[  894.154590]  ret_from_fork+0x11a/0x160
[  894.154593]  ? __pfx_kthread+0x10/0x10
[  894.154596]  ret_from_fork_asm+0x1a/0x30
[  894.154602]  </TASK>
[  894.154603] ---[ end trace 0000000000000000 ]---
[  894.154931] ksmbd: smb_direct: disconnected
[  894.157278] ksmbd: smb_direct: disconnected

I guess sc->first_error is already set and sc->status
is thus unexpected, so this should avoid the WARN[_ON]_ONCE()
if sc->first_error is already set and have a usable error path.

While there set sc->first_error as soon as possible.

v1 of this patch revealed the real problem with this message:

[  309.560973] expected[NEGOTIATE_NEEDED] != RDMA_CONNECT_RUNNING
first_error=0 local=192.168.0.200:445 remote=192.168.0.100:60445
[  309.561034] WARNING: CPU: 2 PID: 78 at transport_rdma.c:643
recv_done+0x2fa/0x3d0 [ksmbd]

Some drivers (at least mlx5_ib) might post a recv completion before
RDMA_CM_EVENT_ESTABLISHED, so we need to adjust our expectation in that
case.

Fixes: e2d5e51 ("smb: server: only turn into SMBDIRECT_SOCKET_CONNECTED when negotiation is done")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Paulo Alcantara <pc@manguebit.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit ccb61a3 ]

kbd_led_set() can sleep, and so may not be used as the brightness_set()
callback.

Otherwise using this led with a trigger leads to system hangs
accompanied by:
BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003
CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.17.9-1
Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024
Call Trace:
 <TASK>
 [...]
 schedule_timeout+0xbd/0x100
 __down_common+0x175/0x290
 down_timeout+0x67/0x70
 acpi_os_wait_semaphore+0x57/0x90
 [...]
 asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi]
 led_trigger_event+0x3f/0x60
 [...]

Fixes: 9fe44fc ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Denis Benato <benato.denis96@gmail.com>
Link: https://patch.msgid.link/20251129101307.18085-3-anton@khirnov.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
[ Upstream commit d84e47e ]

Since commit a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
nouveau appears to be broken for all dispnv04 GPUs (before NV50). Depending
on the kernel version, either having no display output and hanging in
kernel for a long time, or even oopsing in the cleanup path like:

Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
...
nouveau 0000:0a:00.0: drm: 0x14C5: Parsing digital output script table
BUG: Unable to handle kernel data access on read at 0x00041520
Faulting instruction address: 0xc0003d0001be0844
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash  SMP NR_CPUS=8 NUMA PowerMac
Modules linked in: windfarm_cpufreq_clamp windfarm_smu_sensors windfarm_smu_controls windfarm_pm112 snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa windfarm_pid jo
 apple_mfi_fastcharge rndis_host cdc_ether usbnet mii snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore rack_meter windfarm_smu_sat windfarm_max6690_s
m75_sensor windfarm_core gpu_sched drm_gpuvm drm_exec drm_client_lib drm_ttm_helper ttm drm_display_helper drm_kms_helper drm drm_panel_orientation_quirks syscopyar
_sys_fops i2c_algo_bit backlight uio_pdrv_genirq uio uninorth_agp agpgart zram dm_mod dax ipv6 nfsv4 dns_resolver nfs lockd grace sunrpc offb cfbfillrect cfbimgblt
ont input_leds sr_mod cdrom sd_mod uas ata_generic hid_apple hid_generic usbhid hid usb_storage pata_macio sata_svw libata firewire_ohci scsi_mod firewire_core ohci
ehci_pci ehci_hcd tg3 ohci_hcd libphy usbcore usb_common nls_base
 led_class
CPU: 0 UID: 0 PID: 245 Comm: (udev-worker) Not tainted 6.14.0-09584-g7d06015d936c torvalds#7 PREEMPTLAZY
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP:  c0003d0001be0844 LR: c0003d0001be0830 CTR: 0000000000000000
REGS: c0000000053f70e0 TRAP: 0300   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24222220  XER: 00000000
DAR: 0000000000041520 DSISR: 40000000 IRQMASK: 0 \x0aGPR00: c0003d0001be0830 c0000000053f7380 c0003d0000911900 c000000007bc6800 \x0aGPR04: 0000000000000000 0000000000000000 c000000007bc6e70 0000000000000001 \x0aGPR08: 01f3040000000000 0000000000041520 0000000000000000 c0003d0000813958 \x0aGPR12: c000000000071a48 c000000000e28000 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000001 c0003d0000928528 \x0aGPR24: c0003d0000928598 0000000000000000 c000000007025480 c000000007025480 \x0aGPR28: c0000000010b4000 0000000000000000 c000000007bc1800 c000000007bc6800
NIP [c0003d0001be0844] nv_crtc_destroy+0x44/0xd4 [nouveau]
LR [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau]
Call Trace:
[c0000000053f7380] [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] (unreliable)
[c0000000053f73c0] [c0003d00007f7bf4] drm_mode_config_cleanup+0x27c/0x30c [drm]
[c0000000053f7490] [c0003d0001bdea50] nouveau_display_create+0x1cc/0x550 [nouveau]
[c0000000053f7500] [c0003d0001bcc29c] nouveau_drm_device_init+0x1c8/0x844 [nouveau]
[c0000000053f75e0] [c0003d0001bcc9ec] nouveau_drm_probe+0xd4/0x1e0 [nouveau]
[c0000000053f7670] [c000000000557d24] local_pci_probe+0x50/0xa8
[c0000000053f76f0] [c000000000557fa8] pci_device_probe+0x22c/0x240
[c0000000053f7760] [c0000000005fff3c] really_probe+0x188/0x31c
[c0000000053f77e0] [c000000000600204] __driver_probe_device+0x134/0x13c
[c0000000053f7860] [c0000000006002c0] driver_probe_device+0x3c/0xb4
[c0000000053f78a0] [c000000000600534] __driver_attach+0x118/0x128
[c0000000053f78e0] [c0000000005fe038] bus_for_each_dev+0xa8/0xf4
[c0000000053f7950] [c0000000005ff460] driver_attach+0x2c/0x40
[c0000000053f7970] [c0000000005fea68] bus_add_driver+0x130/0x278
[c0000000053f7a00] [c00000000060117c] driver_register+0x9c/0x1a0
[c0000000053f7a80] [c00000000055623c] __pci_register_driver+0x5c/0x70
[c0000000053f7aa0] [c0003d0001c058a0] nouveau_drm_init+0x254/0x278 [nouveau]
[c0000000053f7b10] [c00000000000e9bc] do_one_initcall+0x84/0x268
[c0000000053f7bf0] [c0000000001a0ba0] do_init_module+0x70/0x2d8
[c0000000053f7c70] [c0000000001a42bc] init_module_from_file+0xb4/0x108
[c0000000053f7d50] [c0000000001a4504] sys_finit_module+0x1ac/0x478
[c0000000053f7e10] [c000000000023230] system_call_exception+0x1a4/0x20c
[c0000000053f7e50] [c00000000000c554] system_call_common+0xf4/0x258
 --- interrupt: c00 at 0xfd5f988
NIP:  000000000fd5f988 LR: 000000000ff9b148 CTR: 0000000000000000
REGS: c0000000053f7e80 TRAP: 0c00   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  100000000000d032 <HV,EE,PR,ME,IR,DR,RI>  CR: 28222244  XER: 00000000
IRQMASK: 0 \x0aGPR00: 0000000000000161 00000000ffcdc2d0 00000000405db160 0000000000000020 \x0aGPR04: 000000000ffa2c9c 0000000000000000 000000000000001f 0000000000000045 \x0aGPR08: 0000000011a13770 0000000000000000 0000000000000000 0000000000000000 \x0aGPR12: 0000000000000000 0000000010249d8c 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000000 0000000011a11a70 \x0aGPR24: 0000000011a13580 0000000011a11950 0000000011a11a70 0000000000020000 \x0aGPR28: 000000000ffa2c9c 0000000000000000 000000000ffafc40 0000000011a11a70
NIP [000000000fd5f988] 0xfd5f988
LR [000000000ff9b148] 0xff9b148
 --- interrupt: c00
Code: f821ffc1 418200ac e93f0000 e9290038 e9291468 eba90000 48026c0d e8410018 e93f06aa 3d290001 392982a4 79291f24 <7fdd482a> 2c3e0000 41820030 7fc3f378
 ---[ end trace 0000000000000000 ]---

This is caused by the i2c encoder modules vendored into nouveau/ now
depending on the equally vendored nouveau_i2c_encoder_destroy
function. Trying to auto-load this modules hangs on nouveau
initialization until timeout, and nouveau continues without i2c video
encoders.

Fix by avoiding nouveau dependency by __always_inlining that helper
functions into those i2c video encoder modules.

Fixes: a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
Signed-off-by: René Rebe <rene@exactco.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Lyude: fixed commit reference in description]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251202.164952.2216481867721531616.rene@exactco.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
…sizes

The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: <stable@vger.kernel.org>
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Haien Liang <27873200@qq.com>
Tested-by: Shirong Liu <lsr1024@qq.com>
Tested-by: Haofeng Wu <s2600cw2@126.com>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Shang Yatsen <429839446@qq.com>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

[Mingcong Bai: Resolved a minor merge conflict post-6.16 in
 drivers/gpu/drm/xe/xe_bo.c]

Link: https://lore.kernel.org/all/20250613-upstream-xe-non-4k-v2-v2-1-934f82249f8a@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
While testing my ROCm port for LoongArch and AArch64 (patches pending) on
the following platforms:

- LoongArch ...
  - Loongson AC612A0_V1.1 (Loongson 3C6000/S) + AMD Radeon RX 6800
- AArch64 ...
  - FD30M51 (Phytium FT-D3000) + AMD Radeon RX 7600
  - Huawei D920S10 (Huawei Kunpeng 920) + AMD Radeon RX 7600

When HSA_AMD_SVM is enabled, amdgpu would fail to initialise at all on
LoongArch (no output):

  amdgpu 0000:0d:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
  CPU 0 Unable to handle kernel paging request at virtual address ffffffffff800034, era == 9000000001058044, ra == 9000000001058660
  Oops[#1]:
  CPU: 0 UID: 0 PID: 202 Comm: kworker/0:3 Not tainted 6.16.0+ torvalds#103 PREEMPT(full)
  Hardware name: To be filled by O.E.M.To be fill To be filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS Loongson-UDK2018-V4.0.
  Workqueue: events work_for_cpu_fn
  pc 9000000001058044 ra 9000000001058660 tp 9000000101500000 sp 9000000101503aa0
  a0 ffffffffff800000 a1 0000000ffffe0000 a2 0000000000000000 a3 90000001207c58e0
  a4 9000000001a4c310 a5 0000000000000001 a6 0000000000000000 a7 0000000000000001
  t0 000003ffff800000 t1 0000000000000001 t2 0000040000000000 t3 03ffff0000002000
  t4 0000000000000000 t5 0001010101010101 t6 ffff800000000000 t7 0001000000000000
  t8 000000000000002f u0 0000000000800000 s9 9000000002026000 s0 90000001207c58e0
  s1 0000000000000001 s2 9000000001935c40 s3 0000001000000000 s4 0000000000000001
  s5 0000000ffffe0000 s6 0000000000000040 s7 0001000000000001 s8 0001000000000000
     ra: 9000000001058660 memmap_init_zone_device+0x120/0x1b0
    ERA: 9000000001058044 __init_zone_device_page.constprop.0+0x4/0x1a0
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 00000004 (PPLV0 +PIE -PWE)
   EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00020000 [PIS] (IS= ECode=2 EsubCode=0)
   BADV: ffffffffff800034
   PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
  Modules linked in: amdgpu(+) vfat fat cfg80211 rfkill 8021q garp stp mrp llc snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic drm_client_lib drm_ttm_helper syscopyarea ttm sysfillrect sysimgblt fb_sys_fops drm_panel_backlight_quirks video drm_exec drm_suballoc_helper amdxcp mfd_core drm_buddy gpu_sched drm_display_helper drm_kms_helper cec snd_hda_intel ipmi_ssif snd_intel_dspcfg snd_hda_codec snd_hda_core acpi_ipmi snd_hwdep snd_pcm fb loongson3_cpufreq lcd igc snd_timer ipmi_si spi_loongson_pci spi_loongson_core snd ipmi_devintf soundcore ipmi_msghandler binfmt_misc fuse drm drm_panel_orientation_quirks backlight dm_mod dax nfnetlink
  Process kworker/0:3 (pid: 202, threadinfo=00000000eb7cd5d6, task=000000004ca22b1b)
  Stack : 0000000000001440 0000000000000000 ffffffffff800000 0000000000000001
          90000000020b5978 9000000101503b38 0000000000000001 0000000000000001
          0000000000000000 90000000020b5978 90000000020b3f48 0000000000001440
          0000000000000000 90000001207c58e0 90000001207c5970 9000000000575e20
          90000000010e2e00 90000000020b3f48 900000000205c238 0000000000000000
          00000000000001d3 90000001207c58e0 9000000001958f28 9000000120790848
          90000001207b3510 0000000000000000 9000000120780000 9000000120780010
          90000001207d6000 90000001207c58e0 90000001015660c8 9000000120780000
          0000000000000000 90000000005763a8 90000001207c58e0 00000003ff000000
          9000000120780000 ffff80000296b820 900000012078f968 90000001207c6000
          ...
  Call Trace:
  [<9000000001058044>] __init_zone_device_page.constprop.0+0x4/0x1a0
  [<900000000105865c>] memmap_init_zone_device+0x11c/0x1b0
  [<9000000000575e1c>] memremap_pages+0x24c/0x7b0
  [<90000000005763a4>] devm_memremap_pages+0x24/0x80
  [<ffff80000296b81c>] kgd2kfd_init_zone_device+0x11c/0x220 [amdgpu]
  [<ffff80000265d09c>] amdgpu_device_init+0x27dc/0x2bf0 [amdgpu]
  [<ffff80000265ece8>] amdgpu_driver_load_kms+0x18/0x90 [amdgpu]
  [<ffff800002651fbc>] amdgpu_pci_probe+0x22c/0x890 [amdgpu]
  [<9000000000916adc>] local_pci_probe+0x3c/0xb0
  [<90000000002976c8>] work_for_cpu_fn+0x18/0x30
  [<900000000029aeb4>] process_one_work+0x164/0x320
  [<900000000029b96c>] worker_thread+0x37c/0x4a0
  [<90000000002a695c>] kthread+0x12c/0x220
  [<9000000001055b64>] ret_from_kernel_thread+0x24/0xc0
  [<9000000000237524>] ret_from_kernel_thread_asm+0xc/0x88

  Code: 00000000  00000000  0280040d <2980d08d> 02bffc0e  2980c08e  02c0208d  29c0208d  1400004f

  ---[ end trace 0000000000000000 ]---

Or lock up and/or driver reset during computate tasks, such as when
running llama.cpp over ROCm, at which point the compute process must be
killed before the reset could complete:

  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1202
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 3
  amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1004
  amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 2
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 1
  amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 0
  amdgpu: Failed to quiesce KFD
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
  amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset
  amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume

Disabling the aforementioned option makes the issue go away, though it is
unclear whether this is a platform-specific issue or one that lies within
the amdkfd code.

This patch has been tested on all the aforementioned platform
combinations, and sent as an RFC to encourage discussion.

Signed-off-by: Zhang Yuhao <xinmu@xinmu.moe>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Mingcong Bai <jeffbai@aosc.io>

Link: https://lore.kernel.org/all/20250814032153.227285-1-jeffbai@aosc.io/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>

Bug: https://lore.kernel.org/all/b8da4aec-4cca-4eb0-ba87-5f8641aa2ca9@leemhuis.info/
Signed-off-by: Kexy Biscuit <kexybiscuit@aosc.io>
RevySR pushed a commit that referenced this pull request Dec 24, 2025
…r Loongson 3C6000 series steppings

Older steppings of the Loongson 3C6000 series incorrectly report the
supported link speeds on their PCIe bridges (device IDs 3c19, 3c29) as
only 2.5 GT/s, despite the upstream bus supporting speeds from 2.5 GT/s
up to 16 GT/s.

As a result, certain PCIe devices would be incorrectly probed as a Gen1-
only, even if higher link speeds are supported, harming performance and
prevents dynamic link speed functionality from being enabled in drivers
such as amdgpu.

Manually override the `supported_speeds` field for affected PCIe bridges
with those found on the upstream bus to correctly reflect the supported
link speeds.

This patch was originally found from AOSC OS[1].

Link: AOSC-Tracking#2 #1
Tested-by: Lain Fearyncess Yang <fsf@live.com>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
[Xi Ruoyao: Fix falling through logic and add kernel log output.]
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Link: AOSC-Tracking@4392f44
[Ziyao Li: move from drivers/pci/quirks.c to drivers/pci/controller/pci-loongson.c]
Signed-off-by: Ziyao Li <liziyao@uniontech.com>

[For testing before submission to mailing list.]

Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants