EMC: version 2.8 - reset fix. npcmX50_uart.c: restore NPCMX50_SERIAL_NAME#4
Merged
Conversation
- In npcm750_reset_mac() replaced the call to npcm750_enable_cam_command() with npcm750_ether_set_multicast_list() since it will restore the CAMCMR.AUP bit if needed. - In npcm750_return_default_idle() add few reads from REG_MCMDR before checking its value since I suspected that the first reads are not valid. - Check at runtime the value of REG_FFTCR if need a fix, prints it and fix it (relates to npcm750_return_default_idle()fix), done only when CONFIG_NPCM750_EMC_ETH_DEBUG is defined. - Add code to reset the interfaces via �cat /proc/driver/nmcp750_emc.X.reset� done enabled only when CONFIG_NPCM750_EMC_ETH_DEBUG is defined. - Some Debug improvements
avifishman
pushed a commit
that referenced
this pull request
May 9, 2018
commit ab31fd0 upstream. v4.10 commit 6f2ce1c ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c ("scsi: zfcp: fix rport unblock race with LUN recovery") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
avifishman
pushed a commit
that referenced
this pull request
May 9, 2018
commit a743bbe upstream. The warning below says it all: BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1 caller is __this_cpu_preempt_check CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc8 #4 Call Trace: dump_stack check_preemption_disabled ? do_early_param __this_cpu_preempt_check arch_perfmon_init op_nmi_init ? alloc_pci_root_info oprofile_arch_init oprofile_init do_one_initcall ... These accessors should not have been used in the first place: it is PPro so no mixed silicon revisions and thus it can simply use boot_cpu_data. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Fix-creation-mandated-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Robert Richter <rric@kernel.org> Cc: x86@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
avifishman
pushed a commit
that referenced
this pull request
May 9, 2018
[ Upstream commit 6151b8b ] ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775b ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Jul 9, 2018
commit 3f6e698 upstream. Since commit 1bb8866 ("mtd: nand: denali: handle timing parameters by setup_data_interface()"), denali_dt.c gets the clock rate from the clock driver. The driver expects the frequency of the bus interface clock, whereas the clock driver of SOCFPGA provides the core clock. Thus, the setup_data_interface() hook calculates timing parameters based on a wrong frequency. To make it work without relying on the clock driver, hard-code the clock frequency, 200MHz. This is fine for existing DT of UniPhier, and also fixes the issue of SOCFPGA because both platforms use 200 MHz for the bus interface clock. Fixes: 1bb8866 ("mtd: nand: denali: handle timing parameters by setup_data_interface()") Cc: linux-stable <stable@vger.kernel.org> Nuvoton-Israel#4.14+ Reported-by: Philipp Rosenberger <p.rosenberger@linutronix.de> Suggested-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Jul 20, 2018
commit 0144eb2 upstream. A previous patch removed OMAP clock aliases that were perceived to be unnecessary. Unfortunately, it broke the ethernet on the am3517-evm. This patch enables the MDIO clock and EMAC clock. Fixes: 0ed266d ("clk: ti: omap3: cleanup unnecessary clock aliases") Cc: stable@vger.kernel.org Nuvoton-Israel#4.16+ Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
commit 2546da9 upstream. The RX SGL in processing is already registered with the RX SGL tracking list to support proper cleanup. The cleanup code path uses the sg_num_bytes variable which must therefore be always initialized, even in the error code path. Signed-off-by: Stephan Mueller <smueller@chronox.de> Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com #syz test: https://github.com/google/kmsan.git master CC: <stable@vger.kernel.org> Nuvoton-Israel#4.14 Fixes: e870456 ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52 ("crypto: algif_aead - overhaul memory management") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
commit 36eb8ff upstream. Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 Nuvoton-Israel#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 Nuvoton-Israel#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c Nuvoton-Israel#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a Nuvoton-Israel#4 [ffff891ee00039e0] no_context at ffffffffbd074643 Nuvoton-Israel#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e Nuvoton-Israel#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 Nuvoton-Israel#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a Nuvoton-Israel#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 Nuvoton-Israel#9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Nuvoton-Israel#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] Nuvoton-Israel#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] Nuvoton-Israel#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] Nuvoton-Israel#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] Nuvoton-Israel#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 Nuvoton-Israel#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 Nuvoton-Israel#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 Nuvoton-Israel#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 Nuvoton-Israel#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 Nuvoton-Israel#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- Nuvoton-Israel#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame Nuvoton-Israel#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd Nuvoton-Israel#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 Nuvoton-Israel#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 Nuvoton-Israel#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 Nuvoton-Israel#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 Nuvoton-Israel#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa Nuvoton-Israel#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca Nuvoton-Israel#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 Nuvoton-Israel#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb Nuvoton-Israel#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
commit af1fc5b upstream. This manifsted as strace segfaulting on HSDK because gcc was targetting the accumulator registers as GPRs, which kernek was not saving/restoring by default. Cc: stable@vger.kernel.org Nuvoton-Israel#4.14+ Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
[ Upstream commit 4f4616c ] Similar to what we do when we remove a PCI function, set the QEDF_UNLOADING flag to prevent any requests from being queued while a vport is being deleted. This prevents any requests from getting stuck in limbo when the vport is unloaded or deleted. Fixes the crash: PID: 106676 TASK: ffff9a436aa90000 CPU: 12 COMMAND: "multipathd" #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a Nuvoton-Israel#1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512 Nuvoton-Israel#2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600 Nuvoton-Israel#3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768 Nuvoton-Israel#4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52 Nuvoton-Israel#5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9 Nuvoton-Israel#6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379 Nuvoton-Israel#7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf Nuvoton-Israel#8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915 Nuvoton-Israel#9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768 [exception RIP: qedf_init_task+61] RIP: ffffffffc0e13c2d RSP: ffff9a43567d38d0 RFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffbe920472c738 RCX: ffff9a434fa0e3e8 RDX: ffff9a434f695280 RSI: ffffbe920472c738 RDI: ffff9a43aa359c80 RBP: ffff9a43567d3950 R8: 0000000000000c15 R9: ffff9a3fb09b9880 R10: ffff9a434fa0e3e8 R11: ffff9a43567d35ce R12: 0000000000000000 R13: ffff9a434f695280 R14: ffff9a43aa359c80 R15: ffff9a3fb9e005c0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
commit 89da619 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb Nuvoton-Israel#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 Nuvoton-Israel#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 Nuvoton-Israel#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 Nuvoton-Israel#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 Nuvoton-Israel#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc Nuvoton-Israel#6 [ffff881ec7ed7838] __node_set at ffffffff81680300 Nuvoton-Israel#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f Nuvoton-Israel#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 Nuvoton-Israel#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BenjaminFair
pushed a commit
to BenjaminFair/linux
that referenced
this pull request
Sep 6, 2018
commit 4523502 upstream. Use chip shutdown at the start of unload to stop all DMA + traffic and bring down the laser. This prevents any link activities from triggering the driver to be re-engaged. Fixes: 4b60c82 ("scsi: qla2xxx: Add fw_started flags to qpair") Cc: <stable@vger.kernel.org> Nuvoton-Israel#4.16 Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Jan 10, 2022
Line 1169 (Nuvoton-Israel#3) allocates a memory chunk for victim_name by kmalloc(), but when the function returns in line 1184 (Nuvoton-Israel#4) victim_name allocated by line 1169 (Nuvoton-Israel#3) is not freed, which will lead to a memory leak. There is a similar snippet of code in this function as allocating a memory chunk for victim_name in line 1104 (Nuvoton-Israel#1) as well as releasing the memory in line 1116 (Nuvoton-Israel#2). We should kfree() victim_name when the return value of backref_in_log() is less than zero and before the function returns in line 1184 (Nuvoton-Israel#4). 1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans, 1058 struct btrfs_root *root, 1059 struct btrfs_path *path, 1060 struct btrfs_root *log_root, 1061 struct btrfs_inode *dir, 1062 struct btrfs_inode *inode, 1063 u64 inode_objectid, u64 parent_objectid, 1064 u64 ref_index, char *name, int namelen, 1065 int *search_done) 1066 { 1104 victim_name = kmalloc(victim_name_len, GFP_NOFS); // Nuvoton-Israel#1: kmalloc (victim_name-1) 1105 if (!victim_name) 1106 return -ENOMEM; 1112 ret = backref_in_log(log_root, &search_key, 1113 parent_objectid, victim_name, 1114 victim_name_len); 1115 if (ret < 0) { 1116 kfree(victim_name); // Nuvoton-Israel#2: kfree (victim_name-1) 1117 return ret; 1118 } else if (!ret) { 1169 victim_name = kmalloc(victim_name_len, GFP_NOFS); // Nuvoton-Israel#3: kmalloc (victim_name-2) 1170 if (!victim_name) 1171 return -ENOMEM; 1180 ret = backref_in_log(log_root, &search_key, 1181 parent_objectid, victim_name, 1182 victim_name_len); 1183 if (ret < 0) { 1184 return ret; // Nuvoton-Israel#4: missing kfree (victim_name-2) 1185 } else if (!ret) { 1241 return 0; 1242 } Fixes: d3316c8 ("btrfs: Properly handle backref_in_log retval") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Jan 10, 2022
The fixed commit attempts to close inject.output even if it was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 48 iofclose.c: No such file or directory. (gdb) bt #0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 Nuvoton-Israel#1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376 Nuvoton-Israel#2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085 Nuvoton-Israel#3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313 Nuvoton-Israel#4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 Nuvoton-Israel#5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 Nuvoton-Israel#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539 (gdb) Fixes: 02e6246 ("perf inject: Close inject.output on exit") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Jan 10, 2022
The fixed commit attempts to get the output file descriptor even if the file was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. __GI___fileno (fp=0x0) at fileno.c:35 35 fileno.c: No such file or directory. (gdb) bt #0 __GI___fileno (fp=0x0) at fileno.c:35 Nuvoton-Israel#1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72 Nuvoton-Israel#2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69 Nuvoton-Israel#3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017 Nuvoton-Israel#4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313 Nuvoton-Israel#5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 Nuvoton-Israel#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 Nuvoton-Israel#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539 (gdb) Fixes: 0ae0389 ("perf tools: Pass a fd to perf_file_header__read_pipe()") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Feb 7, 2022
arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.
the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.
for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000 mov r5, r0
e280006c add r0, r0, Nuvoton-Israel#108 ; 0x6c
e1a0400 mov r4, r1
e1a06002 mov r6, r2
e59fa090 ldr sl, [pc, Nuvoton-Israel#144] ;
ebfc7bf8 bl c03aa4b4 <__asan_load4>
e595706c ldr r7, [r5, Nuvoton-Israel#108] ; 0x6c
e2859014 add r9, r5, Nuvoton-Israel#20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr}
e282803c add r8, r2, Nuvoton-Israel#60 ; 0x3c
e1a05000 mov r5, r0
e7e37855 ubfx r7, r5, Nuvoton-Israel#16, Nuvoton-Israel#4
e1a00008 mov r0, r8
e1a09001 mov r9, r1
e1a04002 mov r4, r2
ebf35462 bl c03c6530 <__asan_load4>
e357000f cmp r7, Nuvoton-Israel#15
e7e36655 ubfx r6, r5, Nuvoton-Israel#12, Nuvoton-Israel#4
e205a00f and sl, r5, Nuvoton-Israel#15
0a000001 beq c06f13bc <emulate_ldr+0x38>
e0840107 add r0, r4, r7, lsl Nuvoton-Israel#2
ebf3545c bl c03c6530 <__asan_load4>
e084010a add r0, r4, sl, lsl Nuvoton-Israel#2
ebf3545a bl c03c6530 <__asan_load4>
e2890010 add r0, r9, Nuvoton-Israel#16
ebf35458 bl c03c6530 <__asan_load4>
e5990010 ldr r0, [r9, Nuvoton-Israel#16]
e12fff30 blx r0
e356000f cm r6, Nuvoton-Israel#15
1a000014 bne c06f1430 <emulate_ldr+0xac>
e1a06000 mov r6, r0
e2840040 add r0, r4, Nuvoton-Israel#64 ; 0x40
......
when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [Nuvoton-Israel#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c
r10: 00000000 r9 : c30897f4 r8 : ecd63cd4
r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98
r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 32c5387d Table: 2d546400 DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)
Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <huangshaobo6@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Feb 22, 2022
Rolf Eike Beer reported the following bug: [1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018 [1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 Nuvoton-Israel#4 [1274934.746891] Hardware name: 9000/785/C8000 [1274934.746891] [1274934.746891] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI [1274934.746891] PSW: 00001000000001001111111000001110 Not tainted [1274934.746891] r00-03 000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000 [1274934.746891] r04-07 0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001 [1274934.746891] r08-11 0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001 [1274934.746891] r12-15 0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0 [1274934.746891] r16-19 0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007 [1274934.746891] r20-23 0000000000000006 000000004a368950 0000000000000000 0000000000000001 [1274934.746891] r24-27 0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0 [1274934.746891] r28-31 0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118 [1274934.746891] sr00-03 00000000066e5800 0000000000000000 0000000000000000 00000000066e5800 [1274934.746891] sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [1274934.746891] [1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec [1274934.746891] IIR: 48780030 ISR: 0000000000000000 IOR: 0000004140000018 [1274934.746891] CPU: 3 CR30: 00000040e3a9c000 CR31: ffffffffffffffff [1274934.746891] ORIG_R28: 0000000040acdd58 [1274934.746891] IAOQ[0]: sba_unmap_sg+0xb0/0x118 [1274934.746891] IAOQ[1]: sba_unmap_sg+0xb4/0x118 [1274934.746891] RP(r2): sba_unmap_sg+0xac/0x118 [1274934.746891] Backtrace: [1274934.746891] [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70 [1274934.746891] [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60 [1274934.746891] [<00000000407a3488>] mptscsih_io_done+0x150/0xd70 [1274934.746891] [<0000000040798600>] mpt_interrupt+0x168/0xa68 [1274934.746891] [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278 [1274934.746891] [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8 [1274934.746891] [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0 [1274934.746891] [<00000000402548e0>] generic_handle_irq+0x50/0x70 [1274934.746891] [<000000004019a254>] call_on_stack+0x18/0x24 [1274934.746891] [1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?) The bug is caused by overrunning the sglist and incorrectly testing sg_dma_len(sglist) before nents. Normally this doesn't cause a crash, but in this case sglist crossed a page boundary. This occurs in the following code: while (sg_dma_len(sglist) && nents--) { The fix is simply to test nents first and move the decrement of nents into the loop. Reported-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Feb 22, 2022
When cifs_get_root() fails during cifs_smb3_do_mount() we call deactivate_locked_super() which eventually will call delayed_free() which will free the context. In this situation we should not proceed to enter the out: section in cifs_smb3_do_mount() and free the same resources a second time. [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0 [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G OE 5.17.0-rc3+ Nuvoton-Israel#4 [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 [Thu Feb 10 12:59:06 2022] Call Trace: [Thu Feb 10 12:59:06 2022] <IRQ> [Thu Feb 10 12:59:06 2022] dump_stack_lvl+0x5d/0x78 [Thu Feb 10 12:59:06 2022] print_address_description.constprop.0+0x24/0x150 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] kasan_report.cold+0x7d/0x117 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] __asan_load8+0x86/0xa0 [Thu Feb 10 12:59:06 2022] rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] rcu_core+0x547/0xca0 [Thu Feb 10 12:59:06 2022] ? call_rcu+0x3c0/0x3c0 [Thu Feb 10 12:59:06 2022] ? __this_cpu_preempt_check+0x13/0x20 [Thu Feb 10 12:59:06 2022] ? lock_is_held_type+0xea/0x140 [Thu Feb 10 12:59:06 2022] rcu_core_si+0xe/0x10 [Thu Feb 10 12:59:06 2022] __do_softirq+0x1d4/0x67b [Thu Feb 10 12:59:06 2022] __irq_exit_rcu+0x100/0x150 [Thu Feb 10 12:59:06 2022] irq_exit_rcu+0xe/0x30 [Thu Feb 10 12:59:06 2022] sysvec_hyperv_stimer0+0x9d/0xc0 ... [Thu Feb 10 12:59:07 2022] Freed by task 58179: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] kasan_set_track+0x25/0x30 [Thu Feb 10 12:59:07 2022] kasan_set_free_info+0x24/0x40 [Thu Feb 10 12:59:07 2022] ____kasan_slab_free+0x137/0x170 [Thu Feb 10 12:59:07 2022] __kasan_slab_free+0x12/0x20 [Thu Feb 10 12:59:07 2022] slab_free_freelist_hook+0xb3/0x1d0 [Thu Feb 10 12:59:07 2022] kfree+0xcd/0x520 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0x149/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Thu Feb 10 12:59:07 2022] Last potentially related work creation: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] __kasan_record_aux_stack+0xb6/0xc0 [Thu Feb 10 12:59:07 2022] kasan_record_aux_stack_noalloc+0xb/0x10 [Thu Feb 10 12:59:07 2022] call_rcu+0x76/0x3c0 [Thu Feb 10 12:59:07 2022] cifs_umount+0xce/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] cifs_kill_sb+0xc8/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] deactivate_locked_super+0x5d/0xd0 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0xab9/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Mar 2, 2022
I saw the below splatting after the host suspended and resumed. WARNING: CPU: 0 PID: 2943 at kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:5531 kvm_resume+0x2c/0x30 [kvm] CPU: 0 PID: 2943 Comm: step_after_susp Tainted: G W IOE 5.17.0-rc3+ Nuvoton-Israel#4 RIP: 0010:kvm_resume+0x2c/0x30 [kvm] Call Trace: <TASK> syscore_resume+0x90/0x340 suspend_devices_and_enter+0xaee/0xe90 pm_suspend.cold+0x36b/0x3c2 state_store+0x82/0xf0 kernfs_fop_write_iter+0x1b6/0x260 new_sync_write+0x258/0x370 vfs_write+0x33f/0x510 ksys_write+0xc9/0x160 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae lockdep_is_held() can return -1 when lockdep is disabled which triggers this warning. Let's use lockdep_assert_not_held() which can detect incorrect calls while holding a lock and it also avoids false negatives when lockdep is disabled. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1644920142-81249-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Mar 8, 2022
…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take Nuvoton-Israel#4 - Correctly synchronise PMR and co on PSCI CPU_SUSPEND - Skip tests that depend on GICv3 when the HW isn't available
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
May 10, 2022
When setting up a read or write to the OPB memory space, we must perform
five or six AHB writes. The ordering of these up until the trigger write
does not matter, so use writel_relaxed.
The generated code goes from (Debian GCC 10.2.1-6):
mov r8, r3
mcr 15, 0, sl, cr7, cr10, {4}
str sl, [r6, Nuvoton-Israel#20]
mcr 15, 0, sl, cr7, cr10, {4}
str r3, [r6, Nuvoton-Israel#24]
mcr 15, 0, sl, cr7, cr10, {4}
str r1, [r6, Nuvoton-Israel#28]
mcr 15, 0, sl, cr7, cr10, {4}
str r2, [r6, Nuvoton-Israel#32]
mcr 15, 0, sl, cr7, cr10, {4}
mov r1, Nuvoton-Israel#1
str r1, [r6, Nuvoton-Israel#64] ; 0x40
mcr 15, 0, sl, cr7, cr10, {4}
str r1, [r6, Nuvoton-Israel#4]
to this:
str r3, [r7, Nuvoton-Israel#20]
str r2, [r7, Nuvoton-Israel#24]
str r1, [r7, Nuvoton-Israel#28]
str r3, [r7, Nuvoton-Israel#64]
mov r8, #0
mcr 15, 0, r8, cr7, cr10, {4}
str r3, [r7, Nuvoton-Israel#4]
OpenBMC-Staging-Count: 1
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Tested-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210223041737.171274-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
May 10, 2022
As reported by Alan, the CFI (Call Frame Information) in the VDSO time routines is incorrect since commit ce7d805 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation."). DWARF has a concept called the CFA (Canonical Frame Address), which on powerpc is calculated as an offset from the stack pointer (r1). That means when the stack pointer is changed there must be a corresponding CFI directive to update the calculation of the CFA. The current code is missing those directives for the changes to r1, which prevents gdb from being able to generate a backtrace from inside VDSO functions, eg: Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime () (gdb) bt #0 0x00007ffff7f804dc in __kernel_clock_gettime () Nuvoton-Israel#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6 Nuvoton-Israel#2 0x00007fffffffd960 in ?? () Nuvoton-Israel#3 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6 Backtrace stopped: frame did not save the PC Alan helpfully describes some rules for correctly maintaining the CFI information: 1) Every adjustment to the current frame address reg (ie. r1) must be described, and exactly at the instruction where r1 changes. Why? Because stack unwinding might want to access previous frames. 2) If a function changes LR or any non-volatile register, the save location for those regs must be given. The CFI can be at any instruction after the saves up to the point that the reg is changed. (Exception: LR save should be described before a bl. not after) 3) If asychronous unwind info is needed then restores of LR and non-volatile regs must also be described. The CFI can be at any instruction after the reg is restored up to the point where the save location is (potentially) trashed. Fix the inability to backtrace by adding CFI directives describing the changes to r1, ie. satisfying rule 1. Also change the information for LR to point to the copy saved on the stack, not the value in r0 that will be overwritten by the function call. Finally, add CFI directives describing the save/restore of r2. With the fix gdb can correctly back trace and navigate up and down the stack: Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime () (gdb) bt #0 0x00007ffff7f804dc in __kernel_clock_gettime () Nuvoton-Israel#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6 Nuvoton-Israel#2 0x0000000100015b60 in gettime () Nuvoton-Israel#3 0x000000010000c8bc in print_long_format () Nuvoton-Israel#4 0x000000010000d180 in print_current_files () Nuvoton-Israel#5 0x00000001000054ac in main () (gdb) up Nuvoton-Israel#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6 (gdb) Nuvoton-Israel#2 0x0000000100015b60 in gettime () (gdb) Nuvoton-Israel#3 0x000000010000c8bc in print_long_format () (gdb) Nuvoton-Israel#4 0x000000010000d180 in print_current_files () (gdb) Nuvoton-Israel#5 0x00000001000054ac in main () (gdb) Initial frame selected; you cannot go up. (gdb) down Nuvoton-Israel#4 0x000000010000d180 in print_current_files () (gdb) Nuvoton-Israel#3 0x000000010000c8bc in print_long_format () (gdb) Nuvoton-Israel#2 0x0000000100015b60 in gettime () (gdb) Nuvoton-Israel#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6 (gdb) #0 0x00007ffff7f804dc in __kernel_clock_gettime () (gdb) Fixes: ce7d805 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Alan Modra <amodra@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/r/20220502125010.1319370-1-mpe@ellerman.id.au
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
May 24, 2022
Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be Nuvoton-Israel#1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d Nuvoton-Israel#2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd Nuvoton-Israel#3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 Nuvoton-Israel#4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 Nuvoton-Israel#5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c Nuvoton-Israel#6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 Nuvoton-Israel#7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Nuvoton-Israel#8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] Nuvoton-Israel#9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b Nuvoton-Israel#10 [ff1966a94a713f10] kthread at ffffffff9d101b4d Nuvoton-Israel#11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a7811 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Sep 6, 2022
[ Upstream commit 1a15eb7 ] For device removal and replace we call btrfs_find_device_by_devspec, which if we give it a device path and nothing else will call btrfs_get_dev_args_from_path, which opens the block device and reads the super block and then looks up our device based on that. However at this point we're holding the sb write "lock", so reading the block device pulls in the dependency of ->open_mutex, which produces the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ Nuvoton-Israel#405 Not tainted ------------------------------------------------------ losetup/11576 is trying to acquire lock: ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0 but task is already holding lock: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> Nuvoton-Israel#4 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x25/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x161/0x390 path_openat+0x3cc/0xa20 do_filp_open+0x96/0x120 do_sys_openat2+0x7b/0x130 __x64_sys_openat+0x46/0x70 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> Nuvoton-Israel#3 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 blkdev_get_by_dev.part.0+0x56/0x3c0 blkdev_get_by_path+0x98/0xa0 btrfs_get_bdev_and_sb+0x1b/0xb0 btrfs_find_device_by_devspec+0x12b/0x1c0 btrfs_rm_device+0x127/0x610 btrfs_ioctl+0x2a31/0x2e70 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> Nuvoton-Israel#2 (sb_writers#12){.+.+}-{0:0}: lo_write_bvec+0xc2/0x240 [loop] loop_process_work+0x238/0xd00 [loop] process_one_work+0x26b/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30 -> Nuvoton-Israel#1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x245/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 flush_workqueue+0x91/0x5e0 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); *** DEADLOCK *** 1 lock held by losetup/11576: #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop] stack backtrace: CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ Nuvoton-Israel#405 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0xcf/0xf0 ? stack_trace_save+0x3b/0x50 __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 ? flush_workqueue+0x67/0x5e0 ? lockdep_init_map_type+0x47/0x220 flush_workqueue+0x91/0x5e0 ? flush_workqueue+0x67/0x5e0 ? verify_cpu+0xf0/0x100 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] ? blkdev_ioctl+0x8d/0x2a0 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f31b02404cb Instead what we want to do is populate our device lookup args before we grab any locks, and then pass these args into btrfs_rm_device(). From there we can find the device and do the appropriate removal. Suggested-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
warp5tw
pushed a commit
to warp5tw/linux
that referenced
this pull request
Sep 6, 2022
…tion commit 07fd5b6 upstream. Each cset (css_set) is pinned by its tasks. When we're moving tasks around across csets for a migration, we need to hold the source and destination csets to ensure that they don't go away while we're moving tasks about. This is done by linking cset->mg_preload_node on either the mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the same cset->mg_preload_node for both the src and dst lists was deemed okay as a cset can't be both the source and destination at the same time. Unfortunately, this overloading becomes problematic when multiple tasks are involved in a migration and some of them are identity noop migrations while others are actually moving across cgroups. For example, this can happen with the following sequence on cgroup1: Nuvoton-Israel#1> mkdir -p /sys/fs/cgroup/misc/a/b Nuvoton-Israel#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs Nuvoton-Israel#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS & Nuvoton-Israel#4> PID=$! Nuvoton-Israel#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks Nuvoton-Israel#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs the process including the group leader back into a. In this final migration, non-leader threads would be doing identity migration while the group leader is doing an actual one. After Nuvoton-Israel#3, let's say the whole process was in cset A, and that after Nuvoton-Israel#4, the leader moves to cset B. Then, during Nuvoton-Israel#6, the following happens: 1. cgroup_migrate_add_src() is called on B for the leader. 2. cgroup_migrate_add_src() is called on A for the other threads. 3. cgroup_migrate_prepare_dst() is called. It scans the src list. 4. It notices that B wants to migrate to A, so it tries to A to the dst list but realizes that its ->mg_preload_node is already busy. 5. and then it notices A wants to migrate to A as it's an identity migration, it culls it by list_del_init()'ing its ->mg_preload_node and putting references accordingly. 6. The rest of migration takes place with B on the src list but nothing on the dst list. This means that A isn't held while migration is in progress. If all tasks leave A before the migration finishes and the incoming task pins it, the cset will be destroyed leading to use-after-free. This is caused by overloading cset->mg_preload_node for both src and dst preload lists. We wanted to exclude the cset from the src list but ended up inadvertently excluding it from the dst list too. This patch fixes the issue by separating out cset->mg_preload_node into ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst preloadings don't interfere with each other. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Reported-by: shisiyuan <shisiyuan19870131@gmail.com> Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com Link: https://www.spinics.net/lists/cgroups/msg33313.html Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tmaimon
pushed a commit
that referenced
this pull request
Aug 7, 2024
commit 22f0081 upstream. The syzbot fuzzer found that the interrupt-URB completion callback in the cdc-wdm driver was taking too long, and the driver's immediate resubmission of interrupt URBs with -EPROTO status combined with the dummy-hcd emulation to cause a CPU lockup: cdc_wdm 1-1:1.0: nonzero urb status received: -71 cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625] CPU#0 Utilization every 4s during lockup: #1: 98% system, 0% softirq, 3% hardirq, 0% idle #2: 98% system, 0% softirq, 3% hardirq, 0% idle #3: 98% system, 0% softirq, 3% hardirq, 0% idle #4: 98% system, 0% softirq, 3% hardirq, 0% idle #5: 98% system, 1% softirq, 3% hardirq, 0% idle Modules linked in: irq event stamp: 73096 hardirqs last enabled at (73095): [<ffff80008037bc00>] console_emit_next_record kernel/printk/printk.c:2935 [inline] hardirqs last enabled at (73095): [<ffff80008037bc00>] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994 hardirqs last disabled at (73096): [<ffff80008af10b00>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (73096): [<ffff80008af10b00>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (73048): [<ffff8000801ea530>] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (73048): [<ffff8000801ea530>] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582 softirqs last disabled at (73043): [<ffff800080020de8>] __do_softirq+0x14/0x20 kernel/softirq.c:588 CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G W 6.10.0-rc2-syzkaller-g8867bbd4a056 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Testing showed that the problem did not occur if the two error messages -- the first two lines above -- were removed; apparently adding material to the kernel log takes a surprisingly large amount of time. In any case, the best approach for preventing these lockups and to avoid spamming the log with thousands of error messages per second is to ratelimit the two dev_err() calls. Therefore we replace them with dev_err_ratelimited(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Suggested-by: Greg KH <gregkh@linuxfoundation.org> Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/ Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/ Fixes: 9908a32 ("USB: remove err() macro from usb class drivers") Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowland.harvard.edu/ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tmaimon
pushed a commit
that referenced
this pull request
Aug 7, 2024
[ Upstream commit f1e197a ] trace_drop_common() is called with preemption disabled, and it acquires a spin_lock. This is problematic for RT kernels because spin_locks are sleeping locks in this configuration, which causes the following splat: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47 preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80 hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170 softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0 Preemption disabled at: [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: <TASK> dump_stack_lvl+0x8c/0xd0 dump_stack+0x14/0x20 __might_resched+0x21e/0x2f0 rt_spin_lock+0x5e/0x130 ? trace_drop_common.constprop.0+0xb5/0x290 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_drop_common.constprop.0+0xb5/0x290 ? preempt_count_sub+0x1c/0xd0 ? _raw_spin_unlock_irqrestore+0x4a/0x80 ? __pfx_trace_drop_common.constprop.0+0x10/0x10 ? rt_mutex_slowunlock+0x26a/0x2e0 ? skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_rt_mutex_slowunlock+0x10/0x10 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_kfree_skb_hit+0x15/0x20 trace_kfree_skb+0xe9/0x150 kfree_skb_reason+0x7b/0x110 skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10 ? mark_lock.part.0+0x8a/0x520 ... trace_drop_common() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock. Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic context. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
tmaimon
pushed a commit
that referenced
this pull request
Aug 7, 2024
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
commit ac01c8c upstream. AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with 5c24b67 ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <yunzhao@cloudflare.com> Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com> Cc: Ian Rogers <irogers@google.com> Cc: kernel-team@cloudflare.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org # v5.13+ Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
…tion to perf_sched__replay() [ Upstream commit c690786 ] The start_work_mutex and work_done_wait_mutex are used only for the 'perf sched replay'. Put their initialization in perf_sched__replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ] # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 65658 nsecs the run test took 999991 nsecs the sleep test took 1079324 nsecs nr_run_events: 42378 nr_sleep_events: 43102 nr_wakeup_events: 31852 target-less wakeups: 17 multi-target wakeups: 712 task 0 ( swapper: 0), nr_events: 10451 task 1 ( swapper: 1), nr_events: 3 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 717 ( sched-messaging: 74483), nr_events: 152 task 718 ( sched-messaging: 74484), nr_events: 1944 task 719 ( sched-messaging: 74485), nr_events: 73 task 720 ( sched-messaging: 74486), nr_events: 163 task 721 ( sched-messaging: 74487), nr_events: 942 task 722 ( sched-messaging: 74488), nr_events: 78 task 723 ( sched-messaging: 74489), nr_events: 1090 ------------------------------------------------------------ #1 : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70 #2 : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82 #3 : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96 #4 : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64 #5 : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11 #6 : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88 #7 : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01 #8 : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40 #9 : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15 #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-2-yangjihong1@huawei.com Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string") Signed-off-by: Sasha Levin <sashal@kernel.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
…f_sched__{lat|map|replay}()
[ Upstream commit bd2cdf2 ]
The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.209 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]
# perf sched lat
-------------------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
-------------------------------------------------------------------------------------------------------------------------------------------
sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
<SNIP>
ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
-----------------------------------------------------------------------------------------------------------------
TOTAL: | 3180.750 ms | 41368 |
---------------------------------------------------
# echo $?
0
# perf sched map
*A0 456532.510141 secs A0 => migration/0:15
*. 456532.510171 secs . => swapper:0
. *B0 456532.510261 secs B0 => migration/1:21
. *. 456532.510279 secs
<SNIP>
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . . 456532.785979 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . 456532.786054 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . 456532.786127 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . 456532.786197 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs
# echo $?
0
# perf sched replay
run measurement overhead: 108 nsecs
sleep measurement overhead: 66473 nsecs
the run test took 1000002 nsecs
the sleep test took 1082686 nsecs
nr_run_events: 49334
nr_sleep_events: 50054
nr_wakeup_events: 34701
target-less wakeups: 165
multi-target wakeups: 766
task 0 ( swapper: 0), nr_events: 15419
task 1 ( swapper: 1), nr_events: 1
task 2 ( swapper: 2), nr_events: 1
<SNIP>
task 715 ( sched-messaging: 110248), nr_events: 1438
task 716 ( sched-messaging: 110249), nr_events: 512
task 717 ( sched-messaging: 110250), nr_events: 500
task 718 ( sched-messaging: 110251), nr_events: 537
task 719 ( sched-messaging: 110252), nr_events: 823
------------------------------------------------------------
#1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
#2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
#3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
#4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
#5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
#6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
#7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
#8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
#9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
#10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
# echo $?
0
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <sashal@kernel.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
[ Upstream commit d1bc560 ] Add nested locking with I_MUTEX_XATTR subclass to avoid lockdep warning while handling xattr inode on file open syscall at ext4_xattr_inode_iget. Backtrace EXT4-fs (loop0): Ignoring removed oldalloc option ====================================================== WARNING: possible circular locking dependency detected 5.10.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor543/2794 is trying to acquire lock: ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline] ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 but task is already holding lock: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem/3){++++}-{3:3}: lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 ext4_update_i_disksize fs/ext4/ext4.h:3267 [inline] ext4_xattr_inode_write fs/ext4/xattr.c:1390 [inline] ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1538 [inline] ext4_xattr_set_entry+0x331a/0x3d80 fs/ext4/xattr.c:1662 ext4_xattr_ibody_set+0x124/0x390 fs/ext4/xattr.c:2228 ext4_xattr_set_handle+0xc27/0x14e0 fs/ext4/xattr.c:2385 ext4_xattr_set+0x219/0x390 fs/ext4/xattr.c:2498 ext4_xattr_user_set+0xc9/0xf0 fs/ext4/xattr_user.c:40 __vfs_setxattr+0x404/0x450 fs/xattr.c:177 __vfs_setxattr_noperm+0x11d/0x4f0 fs/xattr.c:208 __vfs_setxattr_locked+0x1f9/0x210 fs/xattr.c:266 vfs_setxattr+0x112/0x2c0 fs/xattr.c:283 setxattr+0x1db/0x3e0 fs/xattr.c:548 path_setxattr+0x15a/0x240 fs/xattr.c:567 __do_sys_setxattr fs/xattr.c:582 [inline] __se_sys_setxattr fs/xattr.c:578 [inline] __x64_sys_setxattr+0xc5/0xe0 fs/xattr.c:578 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb -> #0 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2988 [inline] check_prevs_add kernel/locking/lockdep.c:3113 [inline] validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 inode_lock include/linux/fs.h:782 [inline] ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline] ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline] ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline] __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562 notify_change+0xbb6/0xe60 fs/attr.c:435 do_truncate+0x1de/0x2c0 fs/open.c:64 handle_truncate fs/namei.c:2970 [inline] do_open fs/namei.c:3311 [inline] path_openat+0x29f3/0x3290 fs/namei.c:3425 do_filp_open+0x20b/0x450 fs/namei.c:3452 do_sys_openat2+0x124/0x460 fs/open.c:1207 do_sys_open fs/open.c:1223 [inline] __do_sys_open fs/open.c:1231 [inline] __se_sys_open fs/open.c:1227 [inline] __x64_sys_open+0x221/0x270 fs/open.c:1227 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#7/1); lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#7/1); *** DEADLOCK *** 5 locks held by syz-executor543/2794: #0: ffff888026fbc448 (sb_writers#4){.+.+}-{0:0}, at: mnt_want_write+0x4a/0x2a0 fs/namespace.c:365 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline] #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: do_truncate+0x1cf/0x2c0 fs/open.c:62 #2: ffff8880215e3310 (&ei->i_mmap_sem){++++}-{3:3}, at: ext4_setattr+0xec4/0x19c0 fs/ext4/inode.c:5519 #3: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr fs/ext4/xattr.h:162 [inline] #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize fs/ext4/inode.c:5938 [inline] #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4fb/0x810 fs/ext4/inode.c:6018 stack backtrace: CPU: 1 PID: 2794 Comm: syz-executor543 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x177/0x211 lib/dump_stack.c:118 print_circular_bug+0x146/0x1b0 kernel/locking/lockdep.c:2002 check_noncircular+0x2cc/0x390 kernel/locking/lockdep.c:2123 check_prev_add kernel/locking/lockdep.c:2988 [inline] check_prevs_add kernel/locking/lockdep.c:3113 [inline] validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 inode_lock include/linux/fs.h:782 [inline] ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline] ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline] ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline] __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562 notify_change+0xbb6/0xe60 fs/attr.c:435 do_truncate+0x1de/0x2c0 fs/open.c:64 handle_truncate fs/namei.c:2970 [inline] do_open fs/namei.c:3311 [inline] path_openat+0x29f3/0x3290 fs/namei.c:3425 do_filp_open+0x20b/0x450 fs/namei.c:3452 do_sys_openat2+0x124/0x460 fs/open.c:1207 do_sys_open fs/open.c:1223 [inline] __do_sys_open fs/open.c:1231 [inline] __se_sys_open fs/open.c:1227 [inline] __x64_sys_open+0x221/0x270 fs/open.c:1227 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7f0cde4ea229 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd81d1c978 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 RAX: ffffffffffffffda RBX: 0030656c69662f30 RCX: 00007f0cde4ea229 RDX: 0000000000000089 RSI: 00000000000a0a00 RDI: 00000000200001c0 RBP: 2f30656c69662f2e R08: 0000000000208000 R09: 0000000000208000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd81d1c9c0 R13: 00007ffd81d1ca00 R14: 0000000000080000 R15: 0000000000000003 EXT4-fs error (device loop0): ext4_expand_extra_isize_ea:2730: inode #13: comm syz-executor543: corrupted in-inode xattr Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com> Link: https://patch.msgid.link/20240801143827.19135-1-wojciech.gladysz@infogain.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
[ Upstream commit a848c29 ] On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
commit 348a198 upstream. Luis has been reporting an assert failure when freeing an inode cluster during inode inactivation for a while. The assert looks like: XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs] RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7 RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0 RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000 R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80 FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs process_one_work (kernel/workqueue.c:3231) worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2)) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> And occurs when the the inode precommit handlers is attempt to look up the inode cluster buffer to attach the inode for writeback. The trail of logic that I can reconstruct is as follows. 1. the inode is clean when inodegc runs, so it is not attached to a cluster buffer when precommit runs. 2. #1 implies the inode cluster buffer may be clean and not pinned by dirty inodes when inodegc runs. 3. #2 implies that the inode cluster buffer can be reclaimed by memory pressure at any time. 4. The assert failure implies that the cluster buffer was attached to the transaction, but not marked done. It had been accessed earlier in the transaction, but not marked done. 5. #4 implies the cluster buffer has been invalidated (i.e. marked stale). 6. #5 implies that the inode cluster buffer was instantiated uninitialised in the transaction in xfs_ifree_cluster(), which only instantiates the buffers to invalidate them and never marks them as done. Given factors 1-3, this issue is highly dependent on timing and environmental factors. Hence the issue can be very difficult to reproduce in some situations, but highly reliable in others. Luis has an environment where it can be reproduced easily by g/531 but, OTOH, I've reproduced it only once in ~2000 cycles of g/531. I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag on the cluster buffers, even though they may not be initialised. The reasons why I think this is safe are: 1. A buffer cache lookup hit on a XBF_STALE buffer will clear the XBF_DONE flag. Hence all future users of the buffer know they have to re-initialise the contents before use and mark it done themselves. 2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which means the buffer remains locked until the journal commit completes and the buffer is unpinned. Hence once marked XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only context that can access the freed buffer is the currently running transaction. 3. #2 implies that future buffer lookups in the currently running transaction will hit the transaction match code and not the buffer cache. Hence XBF_STALE and XFS_BLI_STALE will not be cleared unless the transaction initialises and logs the buffer with valid contents again. At which point, the buffer will be marked marked XBF_DONE again, so having XBF_DONE already set on the stale buffer is a moot point. 4. #2 also implies that any concurrent access to that cluster buffer will block waiting on the buffer lock until the inode cluster has been fully freed and is no longer an active inode cluster buffer. 5. #4 + #1 means that any future user of the disk range of that buffer will always see the range of disk blocks covered by the cluster buffer as not done, and hence must initialise the contents themselves. 6. Setting XBF_DONE in xfs_ifree_cluster() then means the unlinked inode precommit code will see a XBF_DONE buffer from the transaction match as it expects. It can then attach the stale but newly dirtied inode to the stale but newly dirtied cluster buffer without unexpected failures. The stale buffer will then sail through the journal and do the right thing with the attached stale inode during unpin. Hence the fix is just one line of extra code. The explanation of why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and complex.... Fixes: 82842fe ("xfs: fix AGF vs inode cluster buffer deadlock") Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxdog988
pushed a commit
that referenced
this pull request
Nov 5, 2024
commit 3d04139 upstream. Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf1 ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops"), and commit 27069e7 ("mptcp: disable active MPTCP in case of blackhole") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "endpoint attempt" MIB entry. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.