fixed spelling errors #184

dgucsekelly · 2015-05-11T11:57:07Z

No description provided.

ecnepsnai · 2015-05-11T20:33:16Z

Please read the documentation: https://github.com/torvalds/linux/blob/master/Documentation/SubmittingPatches

Linus doesn't accept pull requests from GitHub: #17 (comment)

Orabug: 20189959 PID: 614 TASK: ffff882a739da580 CPU: 3 COMMAND: "ocfs2dc" #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5 #2 [ffff882ecc375af0] oops_end at ffffffff815091d8 #3 [ffff882ecc375b20] die at ffffffff8101868b #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0 #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5 torvalds#6 [ffff882ecc375c40] invalid_op at ffffffff815116fb [exception RIP: ocfs2_ci_checkpointed+208] RIP: ffffffffa0a7e940 RSP: ffff882ecc375cf0 RFLAGS: 00010002 RAX: 0000000000000001 RBX: 000000000000654b RCX: ffff8812dc83f1f8 RDX: 00000000000017d9 RSI: ffff8812dc83f1f8 RDI: ffffffffa0b2c318 RBP: ffff882ecc375d20 R8: ffff882ef6ecfa60 R9: ffff88301f272200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff R13: ffff8812dc83f4f0 R14: 0000000000000000 R15: ffff8812dc83f1f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2] torvalds#8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2] torvalds#9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2] torvalds#10 [ffff882ecc375e18] ocfs2_downconvert_thread_do_work at ffffffffa0a85445 [ocfs2] torvalds#11 [ffff882ecc375e68] ocfs2_downconvert_thread at ffffffffa0a854de [ocfs2] torvalds#12 [ffff882ecc375ee8] kthread at ffffffff81090da7 torvalds#13 [ffff882ecc375f48] kernel_thread_helper at ffffffff81511884 assert is tripped because the tran is not checkpointed and the lock level is PR. Some time ago, chmod command had been executed. As result, the following call chain left the inode cluster lock in PR state, latter on causing the assert. system_call_fastpath -> my_chmod -> sys_chmod -> sys_fchmodat -> notify_change -> ocfs2_setattr -> posix_acl_chmod -> ocfs2_iop_set_acl -> ocfs2_set_acl -> ocfs2_acl_set_mode Here is how. 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) 1120 { 1247 ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do. .. 1258 if (!status && attr->ia_valid & ATTR_MODE) { 1259 status = posix_acl_chmod(inode, inode->i_mode); 519 posix_acl_chmod(struct inode *inode, umode_t mode) 520 { .. 539 ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS); 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ... 288 { 289 return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL); 224 int ocfs2_set_acl(handle_t *handle, 225 struct inode *inode, ... 231 { .. 252 ret = ocfs2_acl_set_mode(inode, di_bh, 253 handle, mode); 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ... 170 { 183 if (handle == NULL) { >>> BUG: inode lock not held in ex at this point <<< 184 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), 185 OCFS2_INODE_UPDATE_CREDITS); ocfs2_setattr.torvalds#1247 we unlock and at torvalds#1259 call posix_acl_chmod. When we reach ocfs2_acl_set_mode.torvalds#181 and do trans, the inode cluster lock is not held in EX mode (it should be). How this could have happended? We are the lock master, were holding lock EX and have released it in ocfs2_setattr.torvalds#1247. Note that there are no holders of this lock at this point. Another node needs the lock in PR, and we downconvert from EX to PR. So the inode lock is PR when do the trans in ocfs2_acl_set_mode.torvalds#184. The trans stays in core (not flushed to disc). Now another node want the lock in EX, downconvert thread gets kicked (the one that tripped assert abovt), finds an unflushed trans but the lock is not EX (it is PR). If the lock was at EX, it would have flushed the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL) for the request. ocfs2_setattr must not drop inode lock ex in this code path. If it does, takes it again before the trans, say in ocfs2_set_acl, another cluster node can get in between, execute another setattr, overwriting the one in progress on this node, resulting in a mode acl size combo that is a mix of the two. Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Orabug: 20189959 PID: 614 TASK: ffff882a739da580 CPU: 3 COMMAND: "ocfs2dc" #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5 #2 [ffff882ecc375af0] oops_end at ffffffff815091d8 #3 [ffff882ecc375b20] die at ffffffff8101868b #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0 #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5 torvalds#6 [ffff882ecc375c40] invalid_op at ffffffff815116fb [exception RIP: ocfs2_ci_checkpointed+208] RIP: ffffffffa0a7e940 RSP: ffff882ecc375cf0 RFLAGS: 00010002 RAX: 0000000000000001 RBX: 000000000000654b RCX: ffff8812dc83f1f8 RDX: 00000000000017d9 RSI: ffff8812dc83f1f8 RDI: ffffffffa0b2c318 RBP: ffff882ecc375d20 R8: ffff882ef6ecfa60 R9: ffff88301f272200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff R13: ffff8812dc83f4f0 R14: 0000000000000000 R15: ffff8812dc83f1f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2] torvalds#8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2] torvalds#9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2] torvalds#10 [ffff882ecc375e18] ocfs2_downconvert_thread_do_work at ffffffffa0a85445 [ocfs2] torvalds#11 [ffff882ecc375e68] ocfs2_downconvert_thread at ffffffffa0a854de [ocfs2] torvalds#12 [ffff882ecc375ee8] kthread at ffffffff81090da7 torvalds#13 [ffff882ecc375f48] kernel_thread_helper at ffffffff81511884 assert is tripped because the tran is not checkpointed and the lock level is PR. Some time ago, chmod command had been executed. As result, the following call chain left the inode cluster lock in PR state, latter on causing the assert. system_call_fastpath -> my_chmod -> sys_chmod -> sys_fchmodat -> notify_change -> ocfs2_setattr -> posix_acl_chmod -> ocfs2_iop_set_acl -> ocfs2_set_acl -> ocfs2_acl_set_mode Here is how. 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) 1120 { 1247 ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do. .. 1258 if (!status && attr->ia_valid & ATTR_MODE) { 1259 status = posix_acl_chmod(inode, inode->i_mode); 519 posix_acl_chmod(struct inode *inode, umode_t mode) 520 { .. 539 ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS); 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ... 288 { 289 return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL); 224 int ocfs2_set_acl(handle_t *handle, 225 struct inode *inode, ... 231 { .. 252 ret = ocfs2_acl_set_mode(inode, di_bh, 253 handle, mode); 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ... 170 { 183 if (handle == NULL) { >>> BUG: inode lock not held in ex at this point <<< 184 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), 185 OCFS2_INODE_UPDATE_CREDITS); ocfs2_setattr.torvalds#1247 we unlock and at torvalds#1259 call posix_acl_chmod. When we reach ocfs2_acl_set_mode.torvalds#181 and do trans, the inode cluster lock is not held in EX mode (it should be). How this could have happended? We are the lock master, were holding lock EX and have released it in ocfs2_setattr.torvalds#1247. Note that there are no holders of this lock at this point. Another node needs the lock in PR, and we downconvert from EX to PR. So the inode lock is PR when do the trans in ocfs2_acl_set_mode.torvalds#184. The trans stays in core (not flushed to disc). Now another node want the lock in EX, downconvert thread gets kicked (the one that tripped assert abovt), finds an unflushed trans but the lock is not EX (it is PR). If the lock was at EX, it would have flushed the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL) for the request. ocfs2_setattr must not drop inode lock ex in this code path. If it does, takes it again before the trans, say in ocfs2_set_acl, another cluster node can get in between, execute another setattr, overwriting the one in progress on this node, resulting in a mode acl size combo that is a mix of the two. Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

PID: 614 TASK: ffff882a739da580 CPU: 3 COMMAND: "ocfs2dc" #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5 #2 [ffff882ecc375af0] oops_end at ffffffff815091d8 #3 [ffff882ecc375b20] die at ffffffff8101868b #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0 #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5 #6 [ffff882ecc375c40] invalid_op at ffffffff815116fb [exception RIP: ocfs2_ci_checkpointed+208] RIP: ffffffffa0a7e940 RSP: ffff882ecc375cf0 RFLAGS: 00010002 RAX: 0000000000000001 RBX: 000000000000654b RCX: ffff8812dc83f1f8 RDX: 00000000000017d9 RSI: ffff8812dc83f1f8 RDI: ffffffffa0b2c318 RBP: ffff882ecc375d20 R8: ffff882ef6ecfa60 R9: ffff88301f272200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff R13: ffff8812dc83f4f0 R14: 0000000000000000 R15: ffff8812dc83f1f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2] #8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2] #9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2] #10 [ffff882ecc375e18] ocfs2_downconvert_thread_do_work at ffffffffa0a85445 [ocfs2] #11 [ffff882ecc375e68] ocfs2_downconvert_thread at ffffffffa0a854de [ocfs2] #12 [ffff882ecc375ee8] kthread at ffffffff81090da7 #13 [ffff882ecc375f48] kernel_thread_helper at ffffffff81511884 assert is tripped because the tran is not checkpointed and the lock level is PR. Some time ago, chmod command had been executed. As result, the following call chain left the inode cluster lock in PR state, latter on causing the assert. system_call_fastpath -> my_chmod -> sys_chmod -> sys_fchmodat -> notify_change -> ocfs2_setattr -> posix_acl_chmod -> ocfs2_iop_set_acl -> ocfs2_set_acl -> ocfs2_acl_set_mode Here is how. 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) 1120 { 1247 ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do. .. 1258 if (!status && attr->ia_valid & ATTR_MODE) { 1259 status = posix_acl_chmod(inode, inode->i_mode); 519 posix_acl_chmod(struct inode *inode, umode_t mode) 520 { .. 539 ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS); 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ... 288 { 289 return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL); 224 int ocfs2_set_acl(handle_t *handle, 225 struct inode *inode, ... 231 { .. 252 ret = ocfs2_acl_set_mode(inode, di_bh, 253 handle, mode); 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ... 170 { 183 if (handle == NULL) { >>> BUG: inode lock not held in ex at this point <<< 184 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), 185 OCFS2_INODE_UPDATE_CREDITS); ocfs2_setattr.#1247 we unlock and at #1259 call posix_acl_chmod. When we reach ocfs2_acl_set_mode.#181 and do trans, the inode cluster lock is not held in EX mode (it should be). How this could have happended? We are the lock master, were holding lock EX and have released it in ocfs2_setattr.#1247. Note that there are no holders of this lock at this point. Another node needs the lock in PR, and we downconvert from EX to PR. So the inode lock is PR when do the trans in ocfs2_acl_set_mode.#184. The trans stays in core (not flushed to disc). Now another node want the lock in EX, downconvert thread gets kicked (the one that tripped assert abovt), finds an unflushed trans but the lock is not EX (it is PR). If the lock was at EX, it would have flushed the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL) for the request. ocfs2_setattr must not drop inode lock ex in this code path. If it does, takes it again before the trans, say in ocfs2_set_acl, another cluster node can get in between, execute another setattr, overwriting the one in progress on this node, resulting in a mode acl size combo that is a mix of the two. Orabug: 20189959 Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

The PM runtime API can't be used in atomic contex on -RT even if it's configured as irqsafe. As result, below error report can be seen when PM runtime API called from IRQ chip's callbacks irq_startup/irq_shutdown/irq_set_type, because they are protected by RAW spinlock: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 96, name: insmod 3 locks held by insmod/96: #0: (&dev->mutex){......}, at: [<c04752c8>] __driver_attach+0x54/0xa0 #1: (&dev->mutex){......}, at: [<c04752d4>] __driver_attach+0x60/0xa0 #2: (class){......}, at: [<c00a408c>] __irq_get_desc_lock+0x60/0xa4 irq event stamp: 1834 hardirqs last enabled at (1833): [<c06ab2a4>] _raw_spin_unlock_irqrestore+0x88/0x90 hardirqs last disabled at (1834): [<c06ab068>] _raw_spin_lock_irqsave+0x2c/0x64 softirqs last enabled at (0): [<c003d220>] copy_process.part.52+0x410/0x19d8 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 1 PID: 96 Comm: insmod Tainted: G W O 4.1.3-rt3-00618-g57e2387-dirty torvalds#184 Hardware name: Generic DRA74X (Flattened Device Tree) [<c00190f4>] (unwind_backtrace) from [<c0014734>] (show_stack+0x20/0x24) [<c0014734>] (show_stack) from [<c06a62ec>] (dump_stack+0x88/0xdc) [<c06a62ec>] (dump_stack) from [<c006ca44>] (___might_sleep+0x198/0x2a8) [<c006ca44>] (___might_sleep) from [<c06ab6d4>] (rt_spin_lock+0x30/0x70) [<c06ab6d4>] (rt_spin_lock) from [<c04815ac>] (__pm_runtime_resume+0x68/0xa4) [<c04815ac>] (__pm_runtime_resume) from [<c04123f4>] (omap_gpio_irq_type+0x188/0x1d8) [<c04123f4>] (omap_gpio_irq_type) from [<c00a64e4>] (__irq_set_trigger+0x68/0x130) [<c00a64e4>] (__irq_set_trigger) from [<c00a7bc4>] (irq_set_irq_type+0x44/0x6c) [<c00a7bc4>] (irq_set_irq_type) from [<c00abbf8>] (irq_create_of_mapping+0x120/0x174) [<c00abbf8>] (irq_create_of_mapping) from [<c0577b74>] (of_irq_get+0x48/0x58) [<c0577b74>] (of_irq_get) from [<c0540a14>] (i2c_device_probe+0x54/0x15c) [<c0540a14>] (i2c_device_probe) from [<c04750dc>] (driver_probe_device+0x184/0x2c8) [<c04750dc>] (driver_probe_device) from [<c0475310>] (__driver_attach+0x9c/0xa0) [<c0475310>] (__driver_attach) from [<c0473238>] (bus_for_each_dev+0x7c/0xb0) [<c0473238>] (bus_for_each_dev) from [<c0474af4>] (driver_attach+0x28/0x30) [<c0474af4>] (driver_attach) from [<c0474760>] (bus_add_driver+0x154/0x200) [<c0474760>] (bus_add_driver) from [<c0476348>] (driver_register+0x88/0x108) [<c0476348>] (driver_register) from [<c0541600>] (i2c_register_driver+0x3c/0x90) [<c0541600>] (i2c_register_driver) from [<bf003018>] (pcf857x_init+0x18/0x24 [gpio_pcf857x]) [<bf003018>] (pcf857x_init [gpio_pcf857x]) from [<c000998c>] (do_one_initcall+0x128/0x1e8) [<c000998c>] (do_one_initcall) from [<c06a4220>] (do_init_module+0x6c/0x1bc) [<c06a4220>] (do_init_module) from [<c00dd0c8>] (load_module+0x18e8/0x21c4) [<c00dd0c8>] (load_module) from [<c00ddaa0>] (SyS_init_module+0xfc/0x158) [<c00ddaa0>] (SyS_init_module) from [<c000ff40>] (ret_fast_syscall+0x0/0x54) The IRQ chip interface defines only two callbacks which are executed in non-atomic contex - irq_bus_lock/irq_bus_sync_unlock, so lets move PM runtime calls there. Cc: <linux-rt-users@vger.kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Austin Schuh <austin@peloton-tech.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>

In multi-core system, if the clock is not sync perfectly, it will make cycle_last that recorded by CPU-A is a little more than cycle_now that read by CPU-B. With the negative result, hrtimer_update_base() return a huge and wrong time. It leads to the cpu can not finish the while loop in hrtimer_interrupt() until the real nowtime which is returned from ktime_get() catch up with the wrong time on clock monotonic base. I was able to reproudce the problem with calling clock_settime and clock_adjtime repeatedly on each cpu. The params of the calls is random. Here is the calltrace: Jan 01 00:02:29 Linux kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jan 01 00:02:29 Linux kernel: 0: (2 GPs behind) idle=913/1/0 softirq=59289/59291 fqs=488 Jan 01 00:02:29 Linux kernel: (detected by 20, t=5252 jiffies, g=35769, c=35768, q=567) Jan 01 00:02:29 Linux kernel: Task dump for CPU 0: Jan 01 00:02:29 Linux kernel: swapper/0 R running task 0 0 0 0x00000002 Jan 01 00:02:29 Linux kernel: Call trace: Jan 01 00:02:29 Linux kernel: [<ffffffc000086c5c>] __switch_to+0x74/0x8c Jan 01 00:02:29 Linux kernel: rcu_sched kthread starved for 4764 jiffies! Jan 01 00:03:32 Linux kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] Jan 01 00:03:32 Linux kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.6+ torvalds#184 Jan 01 00:03:32 Linux kernel: task: ffffffc00091cdf0 ti: ffffffc000910000 task.ti: ffffffc000910000 Jan 01 00:03:32 Linux kernel: PC is at arch_cpu_idle+0x10/0x18 Jan 01 00:03:32 Linux kernel: LR is at arch_cpu_idle+0xc/0x18 Jan 01 00:03:32 Linux kernel: pc : [<ffffffc00008686c>] lr : [<ffffffc000086868>] pstate: 60000145 Jan 01 00:03:32 Linux kernel: sp : ffffffc000913f20 Jan 01 00:03:32 Linux kernel: x29: ffffffc000913f20 x28: 000000003f4bbab0 Jan 01 00:03:32 Linux kernel: x27: ffffffc00091669c x26: ffffffc00096e000 Jan 01 00:03:32 Linux kernel: x25: ffffffc000804000 x24: ffffffc000913f30 Jan 01 00:03:32 Linux kernel: x23: 0000000000000001 x22: ffffffc0006817f8 Jan 01 00:03:32 Linux kernel: x21: ffffffc0008fdb00 x20: ffffffc000916618 Jan 01 00:03:32 Linux kernel: x19: ffffffc000910000 x18: 00000000ffffffff Jan 01 00:03:32 Linux kernel: x17: 0000007f9d6f682c x16: ffffffc0001e19d0 Jan 01 00:03:32 Linux kernel: x15: 0000000000000061 x14: 0000000000000072 Jan 01 00:03:32 Linux kernel: x13: 0000000000000067 x12: ffffffc000682528 Jan 01 00:03:32 Linux kernel: x11: 0000000000000005 x10: 00000001000faf9a Jan 01 00:03:32 Linux kernel: x9 : ffffffc000913e60 x8 : ffffffc00091d350 Jan 01 00:03:32 Linux kernel: x7 : 0000000000000000 x6 : 002b24c4f00026aa Jan 01 00:03:32 Linux kernel: x5 : 0000001ffd5c6000 x4 : ffffffc000913ea0 Jan 01 00:03:32 Linux kernel: x3 : ffffffdffdec3b44 x2 : ffffffdffdec3b44 Jan 01 00:03:32 Linux kernel: x1 : 0000000000000000 x0 : 0000000000000000 CPU-A updates the cycle_last in do_settimeofday64() under lock and CPU-B reads the current cycles which is slightly behind CPU-A to substract the cycle_last after unlock, then we get a negative result, after masking it comes to a extremely huge value and lead to "hang" in hrtimer_interrupt(). And multi-core system on X86 had already met such problem and Thomas introduce a fix which is commit 47001d6 ("x86: tsc prevent time going backwards"). And then Thomas moved the fix code into the core code file of time in commit 09ec544 ("clocksource: Move cycle_last validation to core code"). Now the validation can be enabled by config CLOCKSOURCE_VALIDATE_LAST_CYCLE. I think we can fix the problem on arm64 by selecting the config. This is no side effect for systems with counters running properly. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de>

The PM runtime API can't be used in atomic contex on -RT even if it's configured as irqsafe. As result, below error report can be seen when PM runtime API called from IRQ chip's callbacks irq_startup/irq_shutdown/irq_set_type, because they are protected by RAW spinlock: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 96, name: insmod 3 locks held by insmod/96: #0: (&dev->mutex){......}, at: [<c04752c8>] __driver_attach+0x54/0xa0 #1: (&dev->mutex){......}, at: [<c04752d4>] __driver_attach+0x60/0xa0 #2: (class){......}, at: [<c00a408c>] __irq_get_desc_lock+0x60/0xa4 irq event stamp: 1834 hardirqs last enabled at (1833): [<c06ab2a4>] _raw_spin_unlock_irqrestore+0x88/0x90 hardirqs last disabled at (1834): [<c06ab068>] _raw_spin_lock_irqsave+0x2c/0x64 softirqs last enabled at (0): [<c003d220>] copy_process.part.52+0x410/0x19d8 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 1 PID: 96 Comm: insmod Tainted: G W O 4.1.3-rt3-00618-g57e2387-dirty torvalds#184 Hardware name: Generic DRA74X (Flattened Device Tree) [<c00190f4>] (unwind_backtrace) from [<c0014734>] (show_stack+0x20/0x24) [<c0014734>] (show_stack) from [<c06a62ec>] (dump_stack+0x88/0xdc) [<c06a62ec>] (dump_stack) from [<c006ca44>] (___might_sleep+0x198/0x2a8) [<c006ca44>] (___might_sleep) from [<c06ab6d4>] (rt_spin_lock+0x30/0x70) [<c06ab6d4>] (rt_spin_lock) from [<c04815ac>] (__pm_runtime_resume+0x68/0xa4) [<c04815ac>] (__pm_runtime_resume) from [<c04123f4>] (omap_gpio_irq_type+0x188/0x1d8) [<c04123f4>] (omap_gpio_irq_type) from [<c00a64e4>] (__irq_set_trigger+0x68/0x130) [<c00a64e4>] (__irq_set_trigger) from [<c00a7bc4>] (irq_set_irq_type+0x44/0x6c) [<c00a7bc4>] (irq_set_irq_type) from [<c00abbf8>] (irq_create_of_mapping+0x120/0x174) [<c00abbf8>] (irq_create_of_mapping) from [<c0577b74>] (of_irq_get+0x48/0x58) [<c0577b74>] (of_irq_get) from [<c0540a14>] (i2c_device_probe+0x54/0x15c) [<c0540a14>] (i2c_device_probe) from [<c04750dc>] (driver_probe_device+0x184/0x2c8) [<c04750dc>] (driver_probe_device) from [<c0475310>] (__driver_attach+0x9c/0xa0) [<c0475310>] (__driver_attach) from [<c0473238>] (bus_for_each_dev+0x7c/0xb0) [<c0473238>] (bus_for_each_dev) from [<c0474af4>] (driver_attach+0x28/0x30) [<c0474af4>] (driver_attach) from [<c0474760>] (bus_add_driver+0x154/0x200) [<c0474760>] (bus_add_driver) from [<c0476348>] (driver_register+0x88/0x108) [<c0476348>] (driver_register) from [<c0541600>] (i2c_register_driver+0x3c/0x90) [<c0541600>] (i2c_register_driver) from [<bf003018>] (pcf857x_init+0x18/0x24 [gpio_pcf857x]) [<bf003018>] (pcf857x_init [gpio_pcf857x]) from [<c000998c>] (do_one_initcall+0x128/0x1e8) [<c000998c>] (do_one_initcall) from [<c06a4220>] (do_init_module+0x6c/0x1bc) [<c06a4220>] (do_init_module) from [<c00dd0c8>] (load_module+0x18e8/0x21c4) [<c00dd0c8>] (load_module) from [<c00ddaa0>] (SyS_init_module+0xfc/0x158) [<c00ddaa0>] (SyS_init_module) from [<c000ff40>] (ret_fast_syscall+0x0/0x54) The IRQ chip interface defines only two callbacks which are executed in non-atomic contex - irq_bus_lock/irq_bus_sync_unlock, so lets move PM runtime calls there. Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Austin Schuh <austin@peloton-tech.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

The PM runtime API can't be used in atomic contex on -RT even if it's configured as irqsafe. As result, below error report can be seen when PM runtime API called from IRQ chip's callbacks irq_startup/irq_shutdown/irq_set_type, because they are protected by RAW spinlock: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 96, name: insmod 3 locks held by insmod/96: #0: (&dev->mutex){......}, at: [<c04752c8>] __driver_attach+0x54/0xa0 #1: (&dev->mutex){......}, at: [<c04752d4>] __driver_attach+0x60/0xa0 #2: (class){......}, at: [<c00a408c>] __irq_get_desc_lock+0x60/0xa4 irq event stamp: 1834 hardirqs last enabled at (1833): [<c06ab2a4>] _raw_spin_unlock_irqrestore+0x88/0x90 hardirqs last disabled at (1834): [<c06ab068>] _raw_spin_lock_irqsave+0x2c/0x64 softirqs last enabled at (0): [<c003d220>] copy_process.part.52+0x410/0x19d8 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 1 PID: 96 Comm: insmod Tainted: G W O 4.1.3-rt3-00618-g57e2387-dirty torvalds#184 Hardware name: Generic DRA74X (Flattened Device Tree) [<c00190f4>] (unwind_backtrace) from [<c0014734>] (show_stack+0x20/0x24) [<c0014734>] (show_stack) from [<c06a62ec>] (dump_stack+0x88/0xdc) [<c06a62ec>] (dump_stack) from [<c006ca44>] (___might_sleep+0x198/0x2a8) [<c006ca44>] (___might_sleep) from [<c06ab6d4>] (rt_spin_lock+0x30/0x70) [<c06ab6d4>] (rt_spin_lock) from [<c04815ac>] (__pm_runtime_resume+0x68/0xa4) [<c04815ac>] (__pm_runtime_resume) from [<c04123f4>] (omap_gpio_irq_type+0x188/0x1d8) [<c04123f4>] (omap_gpio_irq_type) from [<c00a64e4>] (__irq_set_trigger+0x68/0x130) [<c00a64e4>] (__irq_set_trigger) from [<c00a7bc4>] (irq_set_irq_type+0x44/0x6c) [<c00a7bc4>] (irq_set_irq_type) from [<c00abbf8>] (irq_create_of_mapping+0x120/0x174) [<c00abbf8>] (irq_create_of_mapping) from [<c0577b74>] (of_irq_get+0x48/0x58) [<c0577b74>] (of_irq_get) from [<c0540a14>] (i2c_device_probe+0x54/0x15c) [<c0540a14>] (i2c_device_probe) from [<c04750dc>] (driver_probe_device+0x184/0x2c8) [<c04750dc>] (driver_probe_device) from [<c0475310>] (__driver_attach+0x9c/0xa0) [<c0475310>] (__driver_attach) from [<c0473238>] (bus_for_each_dev+0x7c/0xb0) [<c0473238>] (bus_for_each_dev) from [<c0474af4>] (driver_attach+0x28/0x30) [<c0474af4>] (driver_attach) from [<c0474760>] (bus_add_driver+0x154/0x200) [<c0474760>] (bus_add_driver) from [<c0476348>] (driver_register+0x88/0x108) [<c0476348>] (driver_register) from [<c0541600>] (i2c_register_driver+0x3c/0x90) [<c0541600>] (i2c_register_driver) from [<bf003018>] (pcf857x_init+0x18/0x24 [gpio_pcf857x]) [<bf003018>] (pcf857x_init [gpio_pcf857x]) from [<c000998c>] (do_one_initcall+0x128/0x1e8) [<c000998c>] (do_one_initcall) from [<c06a4220>] (do_init_module+0x6c/0x1bc) [<c06a4220>] (do_init_module) from [<c00dd0c8>] (load_module+0x18e8/0x21c4) [<c00dd0c8>] (load_module) from [<c00ddaa0>] (SyS_init_module+0xfc/0x158) [<c00ddaa0>] (SyS_init_module) from [<c000ff40>] (ret_fast_syscall+0x0/0x54) The IRQ chip interface defines only two callbacks which are executed in non-atomic contex - irq_bus_lock/irq_bus_sync_unlock, so lets move PM runtime calls there. Cc: <linux-rt-users@vger.kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Austin Schuh <austin@peloton-tech.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>

commit 3d46a44 upstream. PID: 614 TASK: ffff882a739da580 CPU: 3 COMMAND: "ocfs2dc" #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5 #2 [ffff882ecc375af0] oops_end at ffffffff815091d8 #3 [ffff882ecc375b20] die at ffffffff8101868b #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0 #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5 torvalds#6 [ffff882ecc375c40] invalid_op at ffffffff815116fb [exception RIP: ocfs2_ci_checkpointed+208] RIP: ffffffffa0a7e940 RSP: ffff882ecc375cf0 RFLAGS: 00010002 RAX: 0000000000000001 RBX: 000000000000654b RCX: ffff8812dc83f1f8 RDX: 00000000000017d9 RSI: ffff8812dc83f1f8 RDI: ffffffffa0b2c318 RBP: ffff882ecc375d20 R8: ffff882ef6ecfa60 R9: ffff88301f272200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff R13: ffff8812dc83f4f0 R14: 0000000000000000 R15: ffff8812dc83f1f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2] torvalds#8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2] torvalds#9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2] assert is tripped because the tran is not checkpointed and the lock level is PR. Some time ago, chmod command had been executed. As result, the following call chain left the inode cluster lock in PR state, latter on causing the assert. system_call_fastpath -> my_chmod -> sys_chmod -> sys_fchmodat -> notify_change -> ocfs2_setattr -> posix_acl_chmod -> ocfs2_iop_set_acl -> ocfs2_set_acl -> ocfs2_acl_set_mode Here is how. 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) 1120 { 1247 ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do. .. 1258 if (!status && attr->ia_valid & ATTR_MODE) { 1259 status = posix_acl_chmod(inode, inode->i_mode); 519 posix_acl_chmod(struct inode *inode, umode_t mode) 520 { .. 539 ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS); 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ... 288 { 289 return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL); 224 int ocfs2_set_acl(handle_t *handle, 225 struct inode *inode, ... 231 { .. 252 ret = ocfs2_acl_set_mode(inode, di_bh, 253 handle, mode); 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ... 170 { 183 if (handle == NULL) { >>> BUG: inode lock not held in ex at this point <<< 184 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), 185 OCFS2_INODE_UPDATE_CREDITS); ocfs2_setattr.torvalds#1247 we unlock and at torvalds#1259 call posix_acl_chmod. When we reach ocfs2_acl_set_mode.torvalds#181 and do trans, the inode cluster lock is not held in EX mode (it should be). How this could have happended? We are the lock master, were holding lock EX and have released it in ocfs2_setattr.torvalds#1247. Note that there are no holders of this lock at this point. Another node needs the lock in PR, and we downconvert from EX to PR. So the inode lock is PR when do the trans in ocfs2_acl_set_mode.torvalds#184. The trans stays in core (not flushed to disc). Now another node want the lock in EX, downconvert thread gets kicked (the one that tripped assert abovt), finds an unflushed trans but the lock is not EX (it is PR). If the lock was at EX, it would have flushed the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL) for the request. ocfs2_setattr must not drop inode lock ex in this code path. If it does, takes it again before the trans, say in ocfs2_set_acl, another cluster node can get in between, execute another setattr, overwriting the one in progress on this node, resulting in a mode acl size combo that is a mix of the two. Orabug: 20189959 Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

lkl: Add offload (TSO4, CSUM) support to LKL device, #2 of 2

commit 3d46a44 upstream. PID: 614 TASK: ffff882a739da580 CPU: 3 COMMAND: "ocfs2dc" #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5 #2 [ffff882ecc375af0] oops_end at ffffffff815091d8 #3 [ffff882ecc375b20] die at ffffffff8101868b #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0 #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5 torvalds#6 [ffff882ecc375c40] invalid_op at ffffffff815116fb [exception RIP: ocfs2_ci_checkpointed+208] RIP: ffffffffa0a7e940 RSP: ffff882ecc375cf0 RFLAGS: 00010002 RAX: 0000000000000001 RBX: 000000000000654b RCX: ffff8812dc83f1f8 RDX: 00000000000017d9 RSI: ffff8812dc83f1f8 RDI: ffffffffa0b2c318 RBP: ffff882ecc375d20 R8: ffff882ef6ecfa60 R9: ffff88301f272200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff R13: ffff8812dc83f4f0 R14: 0000000000000000 R15: ffff8812dc83f1f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2] torvalds#8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2] torvalds#9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2] assert is tripped because the tran is not checkpointed and the lock level is PR. Some time ago, chmod command had been executed. As result, the following call chain left the inode cluster lock in PR state, latter on causing the assert. system_call_fastpath -> my_chmod -> sys_chmod -> sys_fchmodat -> notify_change -> ocfs2_setattr -> posix_acl_chmod -> ocfs2_iop_set_acl -> ocfs2_set_acl -> ocfs2_acl_set_mode Here is how. 1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) 1120 { 1247 ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do. .. 1258 if (!status && attr->ia_valid & ATTR_MODE) { 1259 status = posix_acl_chmod(inode, inode->i_mode); 519 posix_acl_chmod(struct inode *inode, umode_t mode) 520 { .. 539 ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS); 287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ... 288 { 289 return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL); 224 int ocfs2_set_acl(handle_t *handle, 225 struct inode *inode, ... 231 { .. 252 ret = ocfs2_acl_set_mode(inode, di_bh, 253 handle, mode); 168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ... 170 { 183 if (handle == NULL) { >>> BUG: inode lock not held in ex at this point <<< 184 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), 185 OCFS2_INODE_UPDATE_CREDITS); ocfs2_setattr.torvalds#1247 we unlock and at torvalds#1259 call posix_acl_chmod. When we reach ocfs2_acl_set_mode.torvalds#181 and do trans, the inode cluster lock is not held in EX mode (it should be). How this could have happended? We are the lock master, were holding lock EX and have released it in ocfs2_setattr.torvalds#1247. Note that there are no holders of this lock at this point. Another node needs the lock in PR, and we downconvert from EX to PR. So the inode lock is PR when do the trans in ocfs2_acl_set_mode.torvalds#184. The trans stays in core (not flushed to disc). Now another node want the lock in EX, downconvert thread gets kicked (the one that tripped assert abovt), finds an unflushed trans but the lock is not EX (it is PR). If the lock was at EX, it would have flushed the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before downconverting (to NULL) for the request. ocfs2_setattr must not drop inode lock ex in this code path. If it does, takes it again before the trans, say in ocfs2_set_acl, another cluster node can get in between, execute another setattr, overwriting the one in progress on this node, resulting in a mode acl size combo that is a mix of the two. Orabug: 20189959 Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>

…from increase_ipc_memory to processor-sdk-linux-4.19.y * commit '1b430bd15bc7612371157d64f11ddffec4e484fe': TEMP: arm64: dts: ti: k3-am654-base-board: Increase reserve memory for RTOS IPC

This commit fixes the following checkpatch.pl errors: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#170: FILE: ./hal/HalBtc8723b1Ant.h:170: +void EXhalbtc8723b1ant_PowerOnSetting(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#171: FILE: ./hal/HalBtc8723b1Ant.h:171: +void EXhalbtc8723b1ant_InitHwConfig(struct BTC_COEXIST * pBtCoexist, bool bWifiOnly); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#172: FILE: ./hal/HalBtc8723b1Ant.h:172: +void EXhalbtc8723b1ant_InitCoexDm(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#173: FILE: ./hal/HalBtc8723b1Ant.h:173: +void EXhalbtc8723b1ant_IpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#174: FILE: ./hal/HalBtc8723b1Ant.h:174: +void EXhalbtc8723b1ant_LpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#175: FILE: ./hal/HalBtc8723b1Ant.h:175: +void EXhalbtc8723b1ant_ScanNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#176: FILE: ./hal/HalBtc8723b1Ant.h:176: +void EXhalbtc8723b1ant_ConnectNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#177: FILE: ./hal/HalBtc8723b1Ant.h:177: +void EXhalbtc8723b1ant_MediaStatusNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#178: FILE: ./hal/HalBtc8723b1Ant.h:178: +void EXhalbtc8723b1ant_SpecialPacketNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#180: FILE: ./hal/HalBtc8723b1Ant.h:180: + struct BTC_COEXIST * pBtCoexist, u8 *tmpBuf, u8 length ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#182: FILE: ./hal/HalBtc8723b1Ant.h:182: +void EXhalbtc8723b1ant_HaltNotify(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#183: FILE: ./hal/HalBtc8723b1Ant.h:183: +void EXhalbtc8723b1ant_PnpNotify(struct BTC_COEXIST * pBtCoexist, u8 pnpState); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#184: FILE: ./hal/HalBtc8723b1Ant.h:184: +void EXhalbtc8723b1ant_Periodical(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#185: FILE: ./hal/HalBtc8723b1Ant.h:185: +void EXhalbtc8723b1ant_DisplayCoexInfo(struct BTC_COEXIST * pBtCoexist); Signed-off-by: Marco Cesati <marcocesati@gmail.com>

This commit fixes the following checkpatch.pl errors: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#170: FILE: ./hal/HalBtc8723b1Ant.h:170: +void EXhalbtc8723b1ant_PowerOnSetting(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#171: FILE: ./hal/HalBtc8723b1Ant.h:171: +void EXhalbtc8723b1ant_InitHwConfig(struct BTC_COEXIST * pBtCoexist, bool bWifiOnly); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#172: FILE: ./hal/HalBtc8723b1Ant.h:172: +void EXhalbtc8723b1ant_InitCoexDm(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#173: FILE: ./hal/HalBtc8723b1Ant.h:173: +void EXhalbtc8723b1ant_IpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#174: FILE: ./hal/HalBtc8723b1Ant.h:174: +void EXhalbtc8723b1ant_LpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#175: FILE: ./hal/HalBtc8723b1Ant.h:175: +void EXhalbtc8723b1ant_ScanNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#176: FILE: ./hal/HalBtc8723b1Ant.h:176: +void EXhalbtc8723b1ant_ConnectNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#177: FILE: ./hal/HalBtc8723b1Ant.h:177: +void EXhalbtc8723b1ant_MediaStatusNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#178: FILE: ./hal/HalBtc8723b1Ant.h:178: +void EXhalbtc8723b1ant_SpecialPacketNotify(struct BTC_COEXIST * pBtCoexist, u8 type); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#180: FILE: ./hal/HalBtc8723b1Ant.h:180: + struct BTC_COEXIST * pBtCoexist, u8 *tmpBuf, u8 length ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#182: FILE: ./hal/HalBtc8723b1Ant.h:182: +void EXhalbtc8723b1ant_HaltNotify(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#183: FILE: ./hal/HalBtc8723b1Ant.h:183: +void EXhalbtc8723b1ant_PnpNotify(struct BTC_COEXIST * pBtCoexist, u8 pnpState); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#184: FILE: ./hal/HalBtc8723b1Ant.h:184: +void EXhalbtc8723b1ant_Periodical(struct BTC_COEXIST * pBtCoexist); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#185: FILE: ./hal/HalBtc8723b1Ant.h:185: +void EXhalbtc8723b1ant_DisplayCoexInfo(struct BTC_COEXIST * pBtCoexist); Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marco Cesati <marcocesati@gmail.com> Link: https://lore.kernel.org/r/20210315170618.2566-4-marcocesati@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

A few improvements

Use the __string() machinery provided by the tracing subystem to make a copy of the string literals consumed by the "nested VM-Enter failed" tracepoint. A complete copy is necessary to ensure that the tracepoint can't outlive the data/memory it consumes and deference stale memory. Because the tracepoint itself is defined by kvm, if kvm-intel and/or kvm-amd are built as modules, the memory holding the string literals defined by the vendor modules will be freed when the module is unloaded, whereas the tracepoint and its data in the ring buffer will live until kvm is unloaded (or "indefinitely" if kvm is built-in). This bug has existed since the tracepoint was added, but was recently exposed by a new check in tracing to detect exactly this type of bug. fmt: '%s%s ' current_buffer: ' vmx_dirty_log_t-140127 [003] .... kvm_nested_vmenter_failed: ' WARNING: CPU: 3 PID: 140134 at kernel/trace/trace.c:3759 trace_check_vprintf+0x3be/0x3e0 CPU: 3 PID: 140134 Comm: less Not tainted 5.13.0-rc1-ce2e73ce600a-req torvalds#184 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:trace_check_vprintf+0x3be/0x3e0 Code: <0f> 0b 44 8b 4c 24 1c e9 a9 fe ff ff c6 44 02 ff 00 49 8b 97 b0 20 RSP: 0018:ffffa895cc37bcb0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffa895cc37bd08 RCX: 0000000000000027 RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9766cfad74f8 RBP: ffffffffc0a041d4 R08: ffff9766cfad74f0 R09: ffffa895cc37bad8 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc0a041d4 R13: ffffffffc0f4dba8 R14: 0000000000000000 R15: ffff976409f2c000 FS: 00007f92fa200740(0000) GS:ffff9766cfac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559bd11b0000 CR3: 000000019fbaa002 CR4: 00000000001726e0 Call Trace: trace_event_printf+0x5e/0x80 trace_raw_output_kvm_nested_vmenter_failed+0x3a/0x60 [kvm] print_trace_line+0x1dd/0x4e0 s_show+0x45/0x150 seq_read_iter+0x2d5/0x4c0 seq_read+0x106/0x150 vfs_read+0x98/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x40/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: Steven Rostedt <rostedt@goodmis.org> Fixes: 380e005 ("KVM: nVMX: trace nested VM-Enter failures detected by H/W") Signed-off-by: Sean Christopherson <seanjc@google.com>

Add a big batch of selftest to extend test_progs with various tc link, attach ops and old-style tc BPF attachments via libbpf APIs. Also test multi-program attachments including mixing the various attach options: # ./test_progs -t tc_link torvalds#179 tc_link_base:OK torvalds#180 tc_link_detach:OK torvalds#181 tc_link_mix:OK torvalds#182 tc_link_opts:OK torvalds#183 tc_link_run_base:OK torvalds#184 tc_link_run_chain:OK Summary: 6/0 PASSED, 0 SKIPPED, 0 FAILED All new and existing test cases pass. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Add a big batch of selftest to extend test_progs with various tc link, attach ops and old-style tc BPF attachments via libbpf APIs. Also test multi-program attachments including mixing the various attach options: # ./test_progs -t tc_link torvalds#179 tc_link_base:OK torvalds#180 tc_link_detach:OK torvalds#181 tc_link_mix:OK torvalds#182 tc_link_opts:OK torvalds#183 tc_link_run_base:OK torvalds#184 tc_link_run_chain:OK Summary: 6/0 PASSED, 0 SKIPPED, 0 FAILED All new and existing test cases pass. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ANBZ: torvalds#184 It seems no need to print bootstrap and blob_dir path since they takes about ~2ms, which impacts boot speed of light-weight containers: {"msg":"[ 1.092236] erofs: (device erofs): RAFS bootstrap_path /run/kata-containers/shared/containers/rafs/[..]/bootstrap/image.boot","level":"INFO","ts":"2021-12-21T08:53:12.178819781+08:00","subsystem":"dmesg"} {"msg":"[ 1.094299] erofs: (device erofs): RAFS blob_dir_path /run/kata-containers/shared/containers/rafs/[..]/blob_cache_dir","level":"INFO","ts":"2021-12-21T08:53:12.180775832+08:00","subsystem":"dmesg"} Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

When executing mm selftests run_vmtests.sh, there is such error: BUG: Bad page state in process uffd-unit-tests pfn:00000 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff) raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 torvalds#184 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000 900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8 900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001 0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000 0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000 000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940 0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000 0000000000000000 90000000028f8940 ffff800000000000 0000000000000000 0000000000000000 0000000000000000 9000000000223a94 000000012001839c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000000223a94>] show_stack+0x5c/0x180 [<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0 [<900000000056aa08>] bad_page+0x1a0/0x1f0 [<9000000000574978>] free_unref_folios+0xbf0/0xd20 [<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8 [<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260 [<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0 [<9000000000547f30>] tlb_finish_mmu+0xa8/0x218 [<9000000000543cb8>] exit_mmap+0x1a0/0x360 [<9000000000247658>] __mmput+0x78/0x200 [<900000000025583c>] do_exit+0x43c/0xde8 [<9000000000256490>] do_group_exit+0x68/0x110 [<9000000000256554>] sys_exit_group+0x1c/0x20 [<9000000001c413b4>] do_syscall+0x94/0x130 [<90000000002216d8>] handle_syscall+0xb8/0x158 Disabling lock debugging due to kernel taint BUG: non-zero pgtables_bytes on freeing mm: -16384 On LoongArch system, invalid huge pte entry should be invalid_pte_table or _PAGE_HUGE. And it need be the same with invalid pmd entry, since pmd_none() is called by function free_pgd_range(), pmd_none() return 0 by huge_pte_clear(). So _PAGE_HUGE is treated as valid pte table and free_pte_range() will be called in free_pmd_range(). free_pmd_range() pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); } while (pmd++, addr = next, addr != end); Here invalid_pte_table is used for both invalid huge pte entry and pmd entry. Fixes: 09cfefb ("LoongArch: Add memory management") Signed-off-by: Bibo Mao <maobibo@loongson.cn>

When executing mm selftests run_vmtests.sh, there is such an error: BUG: Bad page state in process uffd-unit-tests pfn:00000 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff) raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 torvalds#184 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000 900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8 900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001 0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000 0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000 000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940 0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000 0000000000000000 90000000028f8940 ffff800000000000 0000000000000000 0000000000000000 0000000000000000 9000000000223a94 000000012001839c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000000223a94>] show_stack+0x5c/0x180 [<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0 [<900000000056aa08>] bad_page+0x1a0/0x1f0 [<9000000000574978>] free_unref_folios+0xbf0/0xd20 [<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8 [<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260 [<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0 [<9000000000547f30>] tlb_finish_mmu+0xa8/0x218 [<9000000000543cb8>] exit_mmap+0x1a0/0x360 [<9000000000247658>] __mmput+0x78/0x200 [<900000000025583c>] do_exit+0x43c/0xde8 [<9000000000256490>] do_group_exit+0x68/0x110 [<9000000000256554>] sys_exit_group+0x1c/0x20 [<9000000001c413b4>] do_syscall+0x94/0x130 [<90000000002216d8>] handle_syscall+0xb8/0x158 Disabling lock debugging due to kernel taint BUG: non-zero pgtables_bytes on freeing mm: -16384 On LoongArch system, invalid huge pte entry should be invalid_pte_table or a single _PAGE_HUGE bit rather than a zero value. And it should be the same with invalid pmd entry, since pmd_none() is called by function free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single _PAGE_HUGE bit is also treated as a valid pte table and free_pte_range() will be called in free_pmd_range(). free_pmd_range() pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); } while (pmd++, addr = next, addr != end); Here invalid_pte_table is used for both invalid huge pte entry and pmd entry. Cc: stable@vger.kernel.org Fixes: 09cfefb ("LoongArch: Add memory management") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit 7cd1f5f upstream. When executing mm selftests run_vmtests.sh, there is such an error: BUG: Bad page state in process uffd-unit-tests pfn:00000 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff) raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 torvalds#184 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000 900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8 900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001 0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000 0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000 000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940 0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000 0000000000000000 90000000028f8940 ffff800000000000 0000000000000000 0000000000000000 0000000000000000 9000000000223a94 000000012001839c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000000223a94>] show_stack+0x5c/0x180 [<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0 [<900000000056aa08>] bad_page+0x1a0/0x1f0 [<9000000000574978>] free_unref_folios+0xbf0/0xd20 [<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8 [<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260 [<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0 [<9000000000547f30>] tlb_finish_mmu+0xa8/0x218 [<9000000000543cb8>] exit_mmap+0x1a0/0x360 [<9000000000247658>] __mmput+0x78/0x200 [<900000000025583c>] do_exit+0x43c/0xde8 [<9000000000256490>] do_group_exit+0x68/0x110 [<9000000000256554>] sys_exit_group+0x1c/0x20 [<9000000001c413b4>] do_syscall+0x94/0x130 [<90000000002216d8>] handle_syscall+0xb8/0x158 Disabling lock debugging due to kernel taint BUG: non-zero pgtables_bytes on freeing mm: -16384 On LoongArch system, invalid huge pte entry should be invalid_pte_table or a single _PAGE_HUGE bit rather than a zero value. And it should be the same with invalid pmd entry, since pmd_none() is called by function free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single _PAGE_HUGE bit is also treated as a valid pte table and free_pte_range() will be called in free_pmd_range(). free_pmd_range() pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); } while (pmd++, addr = next, addr != end); Here invalid_pte_table is used for both invalid huge pte entry and pmd entry. Cc: stable@vger.kernel.org Fixes: 09cfefb ("LoongArch: Add memory management") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: https://bugs.launchpad.net/bugs/2102118 commit 7cd1f5f upstream. When executing mm selftests run_vmtests.sh, there is such an error: BUG: Bad page state in process uffd-unit-tests pfn:00000 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff) raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 torvalds#184 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000 900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8 900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001 0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000 0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000 000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940 0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000 0000000000000000 90000000028f8940 ffff800000000000 0000000000000000 0000000000000000 0000000000000000 9000000000223a94 000000012001839c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000000223a94>] show_stack+0x5c/0x180 [<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0 [<900000000056aa08>] bad_page+0x1a0/0x1f0 [<9000000000574978>] free_unref_folios+0xbf0/0xd20 [<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8 [<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260 [<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0 [<9000000000547f30>] tlb_finish_mmu+0xa8/0x218 [<9000000000543cb8>] exit_mmap+0x1a0/0x360 [<9000000000247658>] __mmput+0x78/0x200 [<900000000025583c>] do_exit+0x43c/0xde8 [<9000000000256490>] do_group_exit+0x68/0x110 [<9000000000256554>] sys_exit_group+0x1c/0x20 [<9000000001c413b4>] do_syscall+0x94/0x130 [<90000000002216d8>] handle_syscall+0xb8/0x158 Disabling lock debugging due to kernel taint BUG: non-zero pgtables_bytes on freeing mm: -16384 On LoongArch system, invalid huge pte entry should be invalid_pte_table or a single _PAGE_HUGE bit rather than a zero value. And it should be the same with invalid pmd entry, since pmd_none() is called by function free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single _PAGE_HUGE bit is also treated as a valid pte table and free_pte_range() will be called in free_pmd_range(). free_pmd_range() pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); } while (pmd++, addr = next, addr != end); Here invalid_pte_table is used for both invalid huge pte entry and pmd entry. Cc: stable@vger.kernel.org Fixes: 09cfefb ("LoongArch: Add memory management") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CVE-2024-56628 Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-torvalds#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-torvalds#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: <stable@vger.kernel.org> # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-torvalds#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-torvalds#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>

[ Upstream commit 678e1cc ] xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-torvalds#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-torvalds#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

* tick/sched: Fix bogus condition in report_idle_softirq() In commit 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the new function report_idle_softirq() was created by breaking code out of the existing can_stop_idle_tick() for kernels v5.18 and newer. In doing so, the code essentially went from this form: if (A) { static int ratelimit; if (ratelimit < 10 && !C && A&D) { pr_warn("NOHZ tick-stop error: ..."); ratelimit++; } return false; } to a new function: static bool report_idle_softirq(void) { static int ratelimit; if (likely(!A)) return false; if (ratelimit < 10) return false; ... pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n", pending); ratelimit++; return true; } commit a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition") realized ratelimit was essentially set to zero instead of ten, and hence *no* softirq pending messages would ever be issued, but "fixed" it as: - if (ratelimit < 10) + if (ratelimit >= 10) return false; However, this fix introduced another issue: When ratelimit is greater than or equal 10, even if A is true, it will directly return false. While ratelimit in the original code was only used to control printing and will not affect the return value. Restore the original logic and restrict ratelimit to control the printk and not the return value. Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") Fixes: a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition") Signed-off-by: Wen Yang <wen.yang@linux.dev> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251119174525.29470-1-wen.yang@linux.dev * drm/amd: Skip power ungate during suspend for VPE During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a6c826cfeedd7714611ac115371a959ead55bda) Cc: stable@vger.kernel.org * drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled [Why] Accoreding to CP updated to RS64 on gfx11, WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW. That packet is used for MCBP on F32 based system. So it would lead to incorrect GRBM write and FW is not handling that extra case correctly. [How] With gfx11 rs64 enabled, skip emit de meta data. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8366cd442d226463e673bed5d199df916f4ecbcf) Cc: stable@vger.kernel.org * drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag This fixes sparse mappings (aka. partially resident textures). Check the correct flags. Since a recent refactor, the code works with uAPI flags (for mapping buffer objects), and not PTE (page table entry) flags. Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8feeab26c80635b802f72b3ed986c693ff8f3212) * drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like normal TT/doorbell/preempt memory and unconditionally accessed ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a NULL pointer dereference when computing PDE flags. Fix this by checking that ttm is non-NULL before reading ttm->caching. This prevents the crash for MMIO_REMAP and also makes the code more defensive if other BOs ever come through without a ttm_tt. Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement") Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0db94da5a0a1cacda080b9ec8425fcbe4babc141) * drm/amdgpu: Add sriov vf check for VCN per queue reset support. Add SRIOV check when setting VCN ring's supported reset mask. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ee9b603ad43f9870eb75184f9fb0a84f8c3cc852) Cc: stable@vger.kernel.org * spi: cadence-quadspi: Fix cqspi_probe() error handling for runtime pm Commit f1eb4e792bb1 ("spi: spi-cadence-quadspi: Enable pm runtime earlier to avoid imbalance") relocated code but missed updating the error handling path associated with it. Prior to the relocation, runtime pm was enabled after the code-block associated with 'cqspi_request_mmap_dma()', due to which, the error handling for the same didn't require invoking 'pm_runtime_disable()'. Post refactoring, runtime pm has been enabled before the code-block and when an error is encountered, jumping to 'probe_dma_failed' doesn't invoke 'pm_runtime_disable()'. This leads to a race condition wherein 'cqspi_runtime_suspend()' is invoked while the error handling path executes in parallel. The resulting error is the following: clk:103:0 already disabled WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0x80/0xa0, CPU#1: kworker/u8:0/12 [TRIMMED] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : clk_core_disable+0x80/0xa0 lr : clk_core_disable+0x80/0xa0 [TRIMMED] Call trace: clk_core_disable+0x80/0xa0 (P) clk_core_disable_lock+0x88/0x10c clk_disable+0x24/0x30 cqspi_probe+0xa3c/0xae8 [TRIMMED] The error is due to the second invocation of 'clk_disable_unprepare()' on 'cqspi->clk' in the error handling within 'cqspi_probe()', with the first invocation being within 'cqspi_runtime_suspend()'. Fix this by correcting the error handling. Fixes: f1eb4e792bb1 ("spi: spi-cadence-quadspi: Enable pm runtime earlier to avoid imbalance") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20251119152545.2591651-1-s-vadapalli@ti.com Signed-off-by: Mark Brown <broonie@kernel.org> * wifi: rtw89: hw_scan: Don't let the operating channel be last Scanning can be offloaded to the firmware. To that end, the driver prepares a list of channels to scan, including periodic visits back to the operating channel, and sends the list to the firmware. When the channel list is too long to fit in a single H2C message, the driver splits the list, sends the first part, and tells the firmware to scan. When the scan is complete, the driver sends the next part of the list and tells the firmware to scan. When the last channel that fit in the H2C message is the operating channel something seems to go wrong in the firmware. It will acknowledge receiving the list of channels but apparently it will not do anything more. The AP can't be pinged anymore. The driver still receives beacons, though. One way to avoid this is to split the list of channels before the operating channel. Affected devices: * RTL8851BU with firmware 0.29.41.3 * RTL8832BU with firmware 0.29.29.8 * RTL8852BE with firmware 0.29.29.8 The commit 57a5fbe39a18 ("wifi: rtw89: refactor flow that hw scan handles channel list") is found by git blame, but it is actually to refine the scan flow, but not a culprit, so skip Fixes tag. Reported-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Closes: https://lore.kernel.org/linux-wireless/0abbda91-c5c2-4007-84c8-215679e652e1@gmail.com/ Cc: stable@vger.kernel.org # 6.16+ Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/c1e61744-8db4-4646-867f-241b47d30386@gmail.com * scsi: sg: Do not sleep in atomic context sg_finish_rem_req() calls blk_rq_unmap_user(). The latter function may sleep. Hence, call sg_finish_rem_req() with interrupts enabled instead of disabled. Reported-by: syzbot+c01f8e6e73f20459912e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-scsi/691560c4.a70a0220.3124cb.001a.GAE@google.com/ Cc: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org Fixes: 97d27b0dd015 ("scsi: sg: close race condition in sg_remove_sfp_usercontext()") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251113181643.1108973-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> * mptcp: fix ack generation for fallback msk mptcp_cleanup_rbuf() needs to know the last most recent, mptcp-level rcv_wnd sent, and such information is tracked into the msk->old_wspace field, updated at ack transmission time by mptcp_write_options(). Fallback socket do not add any mptcp options, such helper is never invoked, and msk->old_wspace value remain stale. That in turn makes ack generation at recvmsg() time quite random. Address the issue ensuring mptcp_write_options() is invoked even for fallback sockets, and just update the needed info in such a case. The issue went unnoticed for a long time, as mptcp currently overshots the fallback socket receive buffer autotune significantly. It is going to change in the near future. Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-1-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: avoid unneeded subflow-level drops The rcv window is shared among all the subflows. Currently, MPTCP sync the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time. The above means that incoming data may sporadically observe outdated TCP-level rcv window and being wrongly dropped by TCP. Address the issue checking for the edge condition before queuing the data at TCP level, and eventually syncing the rcv window as needed. Note that the issue is actually present from the very first MPTCP implementation, but backports older than the blamed commit below will range from impossible to useless. Before: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 14 0.0 After: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 0 0.0 Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-2-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix premature close in case of fallback I'm observing very frequent self-tests failures in case of fallback when running on a CONFIG_PREEMPT kernel. The root cause is that subflow_sched_work_if_closed() closes any subflow as soon as it is half-closed and has no incoming data pending. That works well for regular subflows - MPTCP needs bi-directional connectivity to operate on a given subflow - but for fallback socket is race prone. When TCP peer closes the connection before the MPTCP one, subflow_sched_work_if_closed() will schedule the MPTCP worker to gracefully close the subflow, and shortly after will do another schedule to inject and process a dummy incoming DATA_FIN. On CONFIG_PREEMPT kernel, the MPTCP worker can kick-in and close the fallback subflow before subflow_sched_work_if_closed() is able to create the dummy DATA_FIN, unexpectedly interrupting the transfer. Address the issue explicitly avoiding closing fallback subflows on when the peer is only half-closed. Note that, when the subflow is able to create the DATA_FIN before the worker invocation, the worker will change the msk state before trying to close the subflow and will skip the latter operation as the msk will not match anymore the precondition in __mptcp_close_subflow(). Fixes: f09b0ad55a11 ("mptcp: close subflow when receiving TCP+FIN") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-3-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: do not fallback when OoO is present In case of DSS corruption, the MPTCP protocol tries to avoid the subflow reset if fallback is possible. Such corruptions happen in the receive path; to ensure fallback is possible the stack additionally needs to check for OoO data, otherwise the fallback will break the data stream. Fixes: e32d262c89e2 ("mptcp: handle consistently DSS corruption") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: decouple mptcp fastclose from tcp close With the current fastclose implementation, the mptcp_do_fastclose() helper is in charge of two distinct actions: send the fastclose reset and cleanup the subflows. Formally decouple the two steps, ensuring that mptcp explicitly closes all the subflows after the mentioned helper. This will make the upcoming fix simpler, and allows dropping the 2nd argument from mptcp_destroy_common(). The Fixes tag is then the same as in the next commit to help with the backports. Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-5-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix duplicate reset on fastclose The CI reports sporadic failures of the fastclose self-tests. The root cause is a duplicate reset, not carrying the relevant MPTCP option. In the failing scenario the bad reset is received by the peer before the fastclose one, preventing the reception of the latter. Indeed there is window of opportunity at fastclose time for the following race: mptcp_do_fastclose __mptcp_close_ssk __tcp_close() tcp_set_state() [1] tcp_send_active_reset() [2] After [1] the stack will send reset to in-flight data reaching the now closed port. Such reset may race with [2]. Address the issue explicitly sending a single reset on fastclose before explicitly moving the subflow to close status. Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/596 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-6-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: fastclose: remove flaky marks After recent fixes like the parent commit, and "selftests: mptcp: connect: trunc: read all recv data", the two fastclose subtests no longer look flaky any more. It then feels fine to remove these flaky marks, to no longer ignore these subtests in case of errors. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-7-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: endpoints: longer timeout In rare cases, when the test environment is very slow, some endpoints tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to have a longer timeout, and even go over the default one. This connection will be killed at the end, after the verifications: increasing the timeout doesn't change anything, apart from avoiding it to end before the end of the verifications. To play it safe, all endpoints tests not waiting for the end of the transfer are now having a longer timeout: 2 minutes. The Fixes commit was making the connection longer, but still, the default timeout would have stopped it after 1 minute, which might not be enough in very slow environments. Fixes: 6457595db987 ("selftests: mptcp: join: endpoints: longer transfer") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: userspace: longer timeout In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to have a longer timeout, and even go over the default one. This connection will be killed at the end, after the verifications: increasing the timeout doesn't change anything, apart from avoiding it to end before the end of the verifications. To play it safe, all userspace tests not waiting for the end of the transfer are now having a longer timeout: 2 minutes. The Fixes commit was making the connection longer, but still, the default timeout would have stopped it after 1 minute, which might not be enough in very slow environments. Fixes: 290493078b96 ("selftests: mptcp: join: userspace: longer transfer") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix address removal logic in mptcp_pm_nl_rm_addr Fix inverted WARN_ON_ONCE condition that prevented normal address removal counter updates. The current code only executes decrement logic when the counter is already 0 (abnormal state), while normal removals (counter > 0) are ignored. Signed-off-by: Gang Yan <yangang@kylinos.cn> Fixes: 636113918508 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_rm_addr_received") Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-10-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: add a check for 'add_addr_accepted' The previous patch fixed an issue with the 'add_addr_accepted' counter. This was not spot by the test suite. Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add signal' test. This should help spotting similar regressions later on. These counters are crucial for ensuring the MPTCP path manager correctly handles the subflow creation via 'ADD_ADDR'. Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * LoongArch: Use UAPI types in ptrace UAPI header The kernel UAPI headers already contain fixed-width integer types, there is no need to rely on the libc types. There may not be a libc available or the libc may not provides the <stdint.h>, like for example on nolibc. This also aligns the header with the rest of the LoongArch UAPI headers. Fixes: 803b0fc5c3f2 ("LoongArch: Add process management") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Consolidate CPU names in /proc/cpuinfo Some processors have no IOCSR.VENDOR and IOCSR.CPUNAME, some processors have these registers but there is no valid information. Consolidate CPU names in /proc/cpuinfo: 1. Add "PRID" to display the PRID & Core-Name; 2. Let "Model Name" display "Unknown" if no valid name. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Fix NUMA node parsing with numa_memblks On physical machine, NUMA node id comes from high bit 44:48 of physical address. However it is not true on virt machine. With general method, it comes from ACPI SRAT table. Here the common function numa_memblks_init() is used to parse NUMA node information with numa_memblks. Cc: <stable@vger.kernel.org> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Mask all interrupts during kexec/kdump If the default state of the interrupt controllers in the first kernel don't mask any interrupts, it may cause the second kernel to potentially receive interrupts (which were previously allocated by the first kernel) immediately after a CPU becomes online during its boot process. These interrupts cannot be properly routed, leading to bad IRQ issues. This patch calls machine_kexec_mask_interrupts() to mask all interrupts during the kexec/kdump process. Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Don't panic if no valid cache info for PCI If there is no valid cache info detected (may happen in virtual machine) for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI core it will be evaluated to (L1_CACHE_BYTES >> 2). Cc: <stable@vger.kernel.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: BPF: Disable trampoline for kernel module function trace The current LoongArch BPF trampoline implementation is incompatible with tracing functions in kernel modules. This causes several severe and user-visible problems: * The `bpf_selftests/module_attach` test fails consistently. * Kernel lockup when a BPF program is attached to a module function [1]. * Critical kernel modules like WireGuard experience traffic disruption when their functions are traced with fentry [2]. Given the severity and the potential for other unknown side-effects, it is safest to disable the feature entirely for now. This patch prevents the BPF subsystem from allowing trampoline attachments to kernel module functions on LoongArch. This is a temporary mitigation until the core issues in the trampoline code for kernel module handling can be identified and fixed. [root@fedora bpf]# ./test_progs -a module_attach -v bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_module_attach:PASS:skel_open 0 nsec test_module_attach:PASS:set_attach_target 0 nsec test_module_attach:PASS:set_attach_target_explicit 0 nsec test_module_attach:PASS:skel_load 0 nsec libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP test_module_attach:FAIL:skel_attach skeleton attach failed: -524 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Successfully unloaded bpf_testmod.ko. [1]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C13FMT58oqQ@mail.gmail.com/ [2]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcFaK93uoELPg@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support") Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * smb: client: introduce close_cached_dir_locked() Replace close_cached_dir() calls under cfid_list_lock with a new close_cached_dir_locked() variant that uses kref_put() instead of kref_put_lock() to avoid recursive locking when dropping references. While the existing code works if the refcount >= 2 invariant holds, this area has proven error-prone. Make deadlocks impossible and WARN on invariant violations. Cc: stable@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> * cifs: fix memory leak in smb3_fs_context_parse_param error path Add proper cleanup of ctx->source and fc->source to the cifs_parse_mount_err error handler. This ensures that memory allocated for the source strings is correctly freed on all error paths, matching the cleanup already performed in the success path by smb3_cleanup_fs_context_contents(). Pointers are also set to NULL after freeing to prevent potential double-free issues. This change fixes a memory leak originally detected by syzbot. The leak occurred when processing Opt_source mount options if an error happened after ctx->source and fc->source were successfully allocated but before the function completed. The specific leak sequence was: 1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory 2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory 3. A subsequent error jumps to cifs_parse_mount_err 4. The old error handler freed passwords but not the source strings, causing the memory to leak. This issue was not addressed by commit e8c73eb7db0a ("cifs: client: fix memory leak in smb3_fs_context_parse_param"), which only fixed leaks from repeated fsconfig() calls but not this error path. Patch updated with minor change suggested by kernel test robot Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3 Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Steve French <stfrench@microsoft.com> * cifs: Add the smb3_read_* tracepoints to SMB1 Add the smb3_read_* tracepoints to SMB1's cifs_async_readv() and cifs_readv_callback(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> * perf: Fix 0 count issue of cpu-clock Currently cpu-clock event always returns 0 count, e.g., perf stat -e cpu-clock -- sleep 1 Performance counter stats for 'sleep 1': 0 cpu-clock # 0.000 CPUs utilized 1.002308394 seconds time elapsed The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle error of some clock events")' adds PERF_EF_UPDATE flag check before calling cpu_clock_event_update() to update the count, however the PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in counting mode (pmu->dev() -> cpu_clock_event_del() -> cpu_clock_event_stop()). This leads to the cpu-clock event count is never updated. To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event just like what task-clock does. Fixes: bc4394e5e79c ("perf: Fix the throttle error of some clock events") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com * xfs: fix out of bounds memory read error in symlink repair xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> * net: dsa: microchip: lan937x: Fix RGMII delay tuning Correct RGMII delay application logic in lan937x_set_tune_adj(). The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the new delay value. This caused the new value to be bitwise-OR'd with the existing PORT_TUNE_ADJ field instead of replacing it. For example, when setting the RGMII 2 TX delay on port 4, the intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was incorrectly OR'd with the default 0x1B (from register value 0xDA3), leaving the delay at the wrong setting. This patch adds the missing mask to clear the field, ensuring the correct delay value is written. Physical measurements on the RGMII TX lines confirm the fix, showing the delay changing from ~1ns (before change) to ~2ns. While testing on i.MX 8MP showed this was within the platform's timing tolerance, it did not match the intended hardware-characterized value. Fixes: b19ac41faa3f ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> * l2tp: reset skb control buffer on xmit The L2TP stack did not reset the skb control buffer before sending the encapsulated package. In a setup with an ath10k radio and batman-adv over an L2TP tunnel massive fragmentations happen sporadically if the L2TP tunnel is established over IPv4. L2TP might reset some of the fields in the IP control buffer, but L2TP assumes the type of the control buffer to be of an IPv4 packet. In case the L2TP interface is used as a batadv hardif or the packet is an IPv6 packet, this assumption breaks. Clear the entire control buffer to avoid such mishaps altogether. Fixes: f77ae9390438 ("[PPPOL2TP]: Reset meta-data in xmit function") Signed-off-by: David Bauer <mail@david-bauer.net> Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> * ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan() Call scsi_device_put() in ata_scsi_dev_rescan() if the device or its queue are not running. Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Yihang Li <liyihang9@h-partners.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * ata: libata-scsi: Fix system suspend for a security locked drive Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") fixed ata_to_sense_error() to properly generate sense key ABORTED COMMAND (without any additional sense code), instead of the previous bogus sense key ILLEGAL REQUEST with the additional sense code UNALIGNED WRITE COMMAND, for a failed command. However, this broke suspend for Security locked drives (drives that have Security enabled, and have not been Security unlocked by boot firmware). The reason for this is that the SCSI disk driver, for the Synchronize Cache command only, treats any sense data with sense key ILLEGAL REQUEST as a successful command (regardless of ASC / ASCQ). After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") the code that treats any sense data with sense key ILLEGAL REQUEST as a successful command is no longer applicable, so the command fails, which causes the system suspend to be aborted: sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5 sd 1:0:0:0: PM: failed to suspend async: error -5 PM: Some devices failed to suspend, or early wake event detected To make suspend work once again, for a Security locked device only, return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense data which a real SCSI device would have returned if locked. The SCSI disk driver treats this sense data as a successful command. Cc: stable@vger.kernel.org Reported-by: Ilia Baryshnikov <qwelias@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704 Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * ata: libata-core: Set capacity to zero for a security locked drive For Security locked drives (drives that have Security enabled, and have not been Security unlocked by boot firmware), the automatic partition scanning will result in the user being spammed with errors such as: ata5.00: failed command: READ DMA ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error) ata5.00: status: { DRDY ERR } ata5.00: error: { ABRT } sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current] sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information during boot, because most commands except for IDENTIFY will be aborted by a Security locked drive. For a Security locked drive, set capacity to zero, so that no automatic partition scanning will happen. If the user later unlocks the drive using e.g. hdparm, the close() by the user space application should trigger a revalidation of the drive. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * be2net: pass wrb_params in case of OS2BMC be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL pointer when processing a workaround for specific packet, as commit bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6 packet") states. The correct way would be to pass the wrb_params from be_xmit(). Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.") Cc: stable@vger.kernel.org Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> * vsock: Ignore signal/timeout on connect() if already established During connect(), acting on a signal/timeout by disconnecting an already established socket leads to several issues: 1. connect() invoking vsock_transport_cancel_pkt() -> virtio_transport_purge_skbs() may race with sendmsg() invoking virtio_transport_get_credit(). This results in a permanently elevated `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling. 2. connect() resetting a connected socket's state may race with socket being placed in a sockmap. A disconnected socket remaining in a sockmap breaks sockmap's assumptions. And gives rise to WARNs. 3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a transport change/drop after TCP_ESTABLISHED. Which poses a problem for any simultaneous sendmsg() or connect() and may result in a use-after-free/null-ptr-deref. Do not disconnect socket on signal/timeout. Keep the logic for unconnected sockets: they don't linger, can't be placed in a sockmap, are rejected by sendmsg(). [1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/ [2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/ [3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/ Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org> * timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths tk_aux_sysfs_init() returns immediately on error during the auxiliary clock initialization loop without cleaning up previously allocated kobjects and sysfs groups. If kobject_create_and_add() or sysfs_create_group() fails during loop iteration, the parent kobjects (tko and auxo) and any previously created child kobjects are leaked. Fix this by adding proper error handling with goto labels to ensure all allocated resources are cleaned up on failure. kobject_put() on the parent kobjects will handle cleanup of their children. Fixes: 7b95663a3d96 ("timekeeping: Provide interface to control auxiliary clocks") Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120150213.246777-1-mrout@redhat.com * io_uring/cmd_net: fix wrong argument types for skb_queue_splice() If timestamp retriving needs to be retried and the local list of SKB's already has entries, then it's spliced back into the socket queue. However, the arguments for the splice helper are transposed, causing exactly the wrong direction of splicing into the on-stack list. Fix that up. Cc: stable@vger.kernel.org Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com> Fixes: 9e4ed359b8ef ("io_uring/netcmd: add tx timestamping cmd support") Signed-off-by: Jens Axboe <axboe@kernel.dk> * sched_ext: Fix scx_enable() crash on helper kthread creation failure A crash was observed when the sched_ext selftests runner was terminated with Ctrl+\ while test 15 was running: NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0 LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0 Call Trace: scx_enable.constprop.0+0x32c/0x12b0 (unreliable) bpf_struct_ops_link_create+0x18c/0x22c __sys_bpf+0x23f8/0x3044 sys_bpf+0x2c/0x6c system_call_exception+0x124/0x320 system_call_vectored_common+0x15c/0x2ec kthread_run_worker() returns an ERR_PTR() on failure rather than NULL, but the current code in scx_alloc_and_add_sched() only checks for a NULL helper. Incase of failure on SIGQUIT, the error is not handled in scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an error pointer. Error handling is fixed in scx_alloc_and_add_sched() to propagate PTR_ERR() into ret, so that scx_enable() jumps to the existing error path, avoiding random dereference on failure. Fixes: bff3b5aec1b7 ("sched_ext: Move disable machinery into scx_sched") Cc: stable@vger.kernel.org # v6.16+ Reported-and-tested-by: Samir Mulani <samir@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org> * dm: fix failure when empty flush's bi_sector points beyond the device end An empty flush bio can have arbitrary bi_sector. The commit 2b1c6d7a890a introduced a regression that device mapper would fail an empty flush bio with -EIO if the sector pointed beyond the end of the device. The commit introduced an optimization, that optimization would pass flushes to __split_and_process_bio and __split_and_process_bio is not prepared to handle empty bios. Fix this bug by passing only non-empty flushes to __split_and_process_bio - non-empty flushes must have valid bi_sector. Empty bios will go through __send_empty_flush, as they did before the optimization. This problem can be reproduced by running the lvm2 test: make check_local T=lvconvert-thin.sh LVM_TEST_PREFER_BRD=0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 2b1c6d7a890a ("dm: optimize REQ_PREFLUSH with data when using the linear target") Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> * selinux: rename task_security_struct to cred_security_struct Before Linux had cred structures, the SELinux task_security_struct was per-task and although the structure was switched to being per-cred long ago, the name was never updated. This change renames it to cred_security_struct to avoid confusion and pave the way for the introduction of an actual per-task security structure for SELinux. No functional change. Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> * selinux: move avdcache to per-task security struct The avdcache is meant to be per-task; move it to a new task_security_struct that is duplicated per-task. Cc: stable@vger.kernel.org Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: line length fixes] Signed-off-by: Paul Moore <paul@paul-moore.com> * selinux: rename the cred_security_struct variables to "crsec" Along with the renaming from task_security_struct to cred_security_struct, rename the local variables to "crsec" from "tsec". This both fits with existing conventions and helps distinguish between task and cred related variables. No functional changes. Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> * Bluetooth: btusb: mediatek: Fix kernel crash when releasing mtk iso interface When performing reset tests and encountering abnormal card drop issues that lead to a kernel crash, it is necessary to perform a null check before releasing resources to avoid attempting to release a null pointer. <4>[ 29.158070] Hardware name: Google Quigon sku196612/196613 board (DT) <4>[ 29.158076] Workqueue: hci0 hci_cmd_sync_work [bluetooth] <4>[ 29.158154] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 29.158162] pc : klist_remove+0x90/0x158 <4>[ 29.158174] lr : klist_remove+0x88/0x158 <4>[ 29.158180] sp : ffffffc0846b3c00 <4>[ 29.158185] pmr_save: 000000e0 <4>[ 29.158188] x29: ffffffc0846b3c30 x28: ffffff80cd31f880 x27: ffffff80c1bdc058 <4>[ 29.158199] x26: dead000000000100 x25: ffffffdbdc624ea3 x24: ffffff80c1bdc4c0 <4>[ 29.158209] x23: ffffffdbdc62a3e6 x22: ffffff80c6c07000 x21: ffffffdbdc829290 <4>[ 29.158219] x20: 0000000000000000 x19: ffffff80cd3e0648 x18: 000000031ec97781 <4>[ 29.158229] x17: ffffff80c1bdc4a8 x16: ffffffdc10576548 x15: ffffff80c1180428 <4>[ 29.158238] x14: 0000000000000000 x13: 000000000000e380 x12: 0000000000000018 <4>[ 29.158248] x11: ffffff80c2a7fd10 x10: 0000000000000000 x9 : 0000000100000000 <4>[ 29.158257] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 2d7223ff6364626d <4>[ 29.158266] x5 : 0000008000000000 x4 : 0000000000000020 x3 : 2e7325006465636e <4>[ 29.158275] x2 : ffffffdc11afeff8 x1 : 0000000000000000 x0 : ffffffdc11be4d0c <4>[ 29.158285] Call trace: <4>[ 29.158290] klist_remove+0x90/0x158 <4>[ 29.158298] device_release_driver_internal+0x20c/0x268 <4>[ 29.158308] device_release_driver+0x1c/0x30 <4>[ 29.158316] usb_driver_release_interface+0x70/0x88 <4>[ 29.158325] btusb_mtk_release_iso_intf+0x68/0xd8 [btusb (HASH:e8b6 5)] <4>[ 29.158347] btusb_mtk_reset+0x5c/0x480 [btusb (HASH:e8b6 5)] <4>[ 29.158361] hci_cmd_sync_work+0x10c/0x188 [bluetooth (HASH:a4fa 6)] <4>[ 29.158430] process_scheduled_works+0x258/0x4e8 <4>[ 29.158441] worker_thread+0x300/0x428 <4>[ 29.158448] kthread+0x108/0x1d0 <4>[ 29.158455] ret_from_fork+0x10/0x20 <0>[ 29.158467] Code: 91343000 940139d1 f9400268 927ff914 (f9401297) <4>[ 29.158474] ---[ end trace 0000000000000000 ]--- <0>[ 29.167129] Kernel panic - not syncing: Oops: Fatal exception <2>[ 29.167144] SMP: stopping secondary CPUs <4>[ 29.167158] ------------[ cut here ]------------ Fixes: ceac1cb0259d ("Bluetooth: btusb: mediatek: add ISO data transmission functions") Signed-off-by: Chris Lu <chris.lu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_core: Fix triggering cmd_timer for HCI_OP_NOP HCI_OP_NOP means no command was actually sent so there is no point in triggering cmd_timer which may cause a hdev->reset in the process since it is assumed that the controller is stuck processing a command. Fixes: e2d471b7806b ("Bluetooth: ISO: Fix not using SID from adv report") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_sock: Prevent race in socket write iter and sock bind There is a potential race condition between sock bind and socket write iter. bind may free the same cmd via mgmt_pending before write iter sends the cmd, just as syzbot reported in UAF[1]. Here we use hci_dev_lock to synchronize the two, thereby avoiding the UAF mentioned in [1]. [1] syzbot reported: BUG: KASAN: slab-use-after-free in mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316 Read of size 8 at addr ffff888077164818 by task syz.0.17/5989 Call Trace: mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316 set_link_security+0x5c2/0x710 net/bluetooth/mgmt.c:1918 hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719 hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 sock_write_iter+0x279/0x360 net/socket.c:1195 Allocated by task 5989: mgmt_pending_add+0x35/0x140 net/bluetooth/mgmt_util.c:296 set_link_security+0x557/0x710 net/bluetooth/mgmt.c:1910 hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719 hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 sock_write_iter+0x279/0x360 net/socket.c:1195 Freed by task 5991: mgmt_pending_free net/bluetooth/mgmt_util.c:311 [inline] mgmt_pending_foreach+0x30d/0x380 net/bluetooth/mgmt_util.c:257 mgmt_index_removed+0x112/0x2f0 net/bluetooth/mgmt.c:9477 hci_sock_bind+0xbe9/0x1000 net/bluetooth/hci_sock.c:1314 Fixes: 6fe26f694c82 ("Bluetooth: MGMT: Protect mgmt_pending list with its own lock") Reported-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9aa47cd4633a3cf92a80 Tested-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_core: lookup hci_conn on RX path on protocol side The hdev lock/lookup/unlock/use pattern in the packet RX path doesn't ensure hci_conn* is not concurrently modified/deleted. This locking appears to be leftover from before conn_hash started using RCU commit bf4c63252490b ("Bluetooth: convert conn hash to RCU") and not clear if it had purpose since then. Currently, there are code paths that delete hci_conn* from elsewhere than the ordered hdev->workqueue where the RX work runs in. E.g. commit 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") introduced some of these, and there probably were a few others before it. It's better to do the locking so that even if these run concurrently no UAF is possible. Move the lookup of hci_conn and associated socket-specific conn to protocol recv handlers, and do them within a single critical section to cover hci_conn* usage and lookup. syzkaller has reported a crash that appears to be this issue: [Task hdev->workqueue] [Task 2] hci_disconnect_all_sync l2cap_recv_acldata(hcon) hci_conn_get(hcon) hci_abort_conn_sync(hcon) hci_dev_lock hci_dev_lock hci_conn_del(hcon) v-------------------------------- hci_dev_unlock hci_conn_put(hcon) conn = hcon->l2cap_data (UAF) Fixes: 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") Reported-by: syzbot+d32d77220b92eddd89ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d32d77220b92eddd89ad Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: btusb: mediatek: Avoid btusb_mtk_claim_iso_intf() NULL deref In btusb_mtk_setup(), we set `btmtk_data->isopkt_intf` to: usb_ifnum_to_if(data->udev, MTK_ISO_IFNUM) That function can return NULL in some cases. Even when it returns NULL, though, we still go on to call btusb_mtk_claim_iso_intf(). As of commit e9087e828827 ("Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()"), calling btusb_mtk_claim_iso_intf() when `btmtk_data->isopkt_intf` is NULL will cause a crash because we'll end up passing a bad pointer to device_lock(). Prior to that commit we'd pass the NULL pointer directly to usb_driver_claim_interface() which would detect it and return an error, which was handled. Resolve the crash in btusb_mtk_claim_iso_intf() by adding a NULL check at the start of the function. This makes the code handle a NULL `btmtk_data->isopkt_intf` the same way it did before the problematic commit (just with a slight change to the error message printed). Reported-by: IncogCyberpunk <incogcyberpunk@proton.me> Closes: http://lore.kernel.org/r/a380d061-479e-4713-bddd-1d6571ca7e86@leemhuis.info Fixes: e9087e828827 ("Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()") Cc: stable@vger.kernel.org Tested-by: IncogCyberpunk <incogcyberpunk@proton.me> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: SMP: Fix not generating mackey and ltk when repairing The change eed467b517e8 ("Bluetooth: fix passkey uninitialized when used") introduced a goto that bypasses the creation of temporary mackey and ltk which are later used by the likes of DHKey Check step. Later ffee202a78c2 ("Bluetooth: Always request for user confirmation for Just Works (LE SC)") which means confirm_hint is always set in case JUST_WORKS so the branch checking for an existing LTK becomes pointless as confirm_hint will always be set, so this just merge both cases of malicious or legitimate devices to be confirmed before continuing with the pairing procedure. Link: https://github.com/bluez/bluez/issues/1622 Fixes: eed467b517e8 ("Bluetooth: fix passkey uninitialized when used") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * net: atm: fix incorrect cleanup function call in error path In atm_init(), if atmsvc_init() fails, the code jumps to out_atmpvc_exit label which incorrectly calls atmsvc_exit() instead of atmpvc_exit(). This results in calling the wrong cleanup function and failing to properly clean up atmpvc_init(). Fix this by calling atmpvc_exit() in the out_atmpvc_exit error path. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sayooj K Karun <sayooj@aerlync.com> Link: https://patch.msgid.link/20251119085747.67139-1-sayooj@aerlync.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> * veth: reduce XDP no_direct return section to fix race As explain in commit fa349e396e48 ("veth: Fix race with AF_XDP exposing old or uninitialized descriptors") for veth there is a chance after napi_complete_done() that another CPU can manage start another NAPI instance running veth_pool(). For NAPI this is correctly handled as the napi_schedule_prep() check will prevent multiple instances from getting scheduled, but for the remaining code in veth_pool() this can run concurrent with the newly started NAPI instance. The problem/race is that xdp_clear_return_frame_no_direct() isn't designed to be nested. Prior to commit 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") the temporary BPF net context bpf_redirect_info was stored per CPU, where this wasn't an issue. Since this commit the BPF context is stored in 'current' task_struct. When running veth in threaded-NAPI mode, then the kthread becomes the storage area. Now a race exists between two concurrent veth_pool() function calls one exiting NAPI and one running new NAPI, both using the same BPF net context. Race is when another CPU gets within the xdp_set_return_frame_no_direct() section before exiting veth_pool() calls the clear-function xdp_clear_return_frame_no_direct(). Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/176356963888.337072.4805242001928705046.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org> * ALSA: hda/realtek: add quirk for HP pavilion aero laptop 13z-be200 The laptop uses ALC287 chip (as shown in /proc/asound/card1/codec#0). It seems that every HP pavilion laptop in the table uses the same quirk, so I just copied them. I have verified that the mute LED on my laptop works with this patch. For reference, here's the alsa-info of my laptop: https://alsa-project.org/db/?f=2d5f297087708610bc01816ab12052abdd4a17c0 Signed-off-by: Jacob Zhong <cmpute@qq.com> Link: https://patch.msgid.link/tencent_E2DFA33EFDF39E0517A94FA8FF06C05C0709@qq.com Signed-off-by: Takashi Iwai <tiwai@suse.de> * dm-verity: fix unreliable memory allocation GFP_NOWAIT allocation may fail anytime. It needs to be changed to GFP_NOIO. There's no need to handle an error because mempool_alloc with GFP_NOIO can't fail. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Biggers <ebiggers@kernel.org> * MIPS: mm: Prevent a TLB shutdown on initial uniquification Depending on the particular CPU implementation a TLB shutdown may occur if multiple matching entries are detected upon the execution of a TLBP or the TLBWI/TLBWR instructions. Given that we don't know what entries we have been handed we need to be very careful with the initial TLB setup and avoid all these instructions. Therefore read all the TLB entries one by one with the TLBR instruction, bypassing the content addressing logic, and truncate any large pages in place so as to avoid a case in the second step where an incoming entry for a large page at a lower address overlaps with a replacement entry chosen at another index. Then preinitialize the TLB using addresses outside our usual unique range and avoiding clashes with any entries received, before making the usual call to local_flush_tlb_all(). This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual address). Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init") Cc: stable@vger.kernel.org Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> * MIPS: kernel: Fix random segmentation faults Commit 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation") switches to a generic vdso storage, which increases the number of data pages from 1 to 4. But there is only one page reserved, which causes segementation faults depending where the VDSO area is randomized to. To fix this use the same size of reservation and allocation of the VDSO data pages. Fixes: 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation") Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> * usb: storage: sddr55: Reject out-of-bound new_pba Discovered by Atuin - Automated Vulnerability Discovery Engine. new_pba comes from the status packet returned after each write. A bogus device could report values beyond the block count derived from info->capacity, letting the driver walk off the end of pba_to_lba[] and corrupt heap memory. Reject PBAs that exceed the computed block count and fail the transfer so we avoid touching out-of-range mapping entries. Signed-off-by: Tianchu Chen <flynnnchen@tencent.com> Cc: stable <stable@kernel.org> Link: https://patch.msgid.link/B2DC73A3EE1E3A1D+202511161322001664687@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * xhci: dbgtty: fix device unregister When DbC is disconnected then xhci_dbc_tty_unregister_device() is called. However if there is any user space process blocked on write to DbC terminal device then it will never be signalled and thus stay blocked indifinitely. This fix adds a tty_vhangup() call in xhci_dbc_tty_unregister_device(). The tty_vhangup() wakes up any blocked writers and causes subsequent write attempts to DbC terminal device to fail. Cc: stable <stable@kernel.org> Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver") Signed-off-by: Łukasz Bartosik <ukaszb@chromium.org> Link: https://patch.msgid.link/20251119212910.1245694-1-ukaszb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * usb: dwc3: Fix race condition between concurrent dwc3_remove_requests() call paths This patch addresses a race condition caused by unsynchronized execution of multiple call paths invoking `dwc3_remove_requests()`, leading to premature freeing of USB requests and subsequent crashes. Three distinct execution paths interact with `dwc3_remove_requests()`: Path 1: Triggered via `dwc3_gadget_reset_interrupt()` during USB reset handling. The call stack includes: - `dwc3_ep0_reset_state()` - `dwc3_ep0_stall_and_restart()` - `dwc3_ep0_out_start()` - `dwc3_remove_requests()` - `dwc3_gadget_del_and_unmap_request()` Path 2: Also initiated from `dwc3_gadget_reset_interrupt()`, but through `dwc3_stop_active_transfers()`. The call stack includes: - `dwc3_stop_active_transfers()` - `dwc3_remove_requests()` - `dwc3_gadget_del_and_unmap_request()` Path 3: Occurs independently during `adb root` execution, which triggers USB function unbind and bind operations. The sequence includes: - `gserial_disconnect()` - `usb_ep_disable()` - `dwc3_gadget_ep_disable()` - `dwc3_remove_requests()` with `-ESHUTDOWN` status Path 3 operates asynchronously and lacks synchronization with Paths 1 and 2. When Path 3 completes, it disables endpoints and frees 'out' requests. If Paths 1 or 2 are still processing these requests, accessing freed memory leads to a crash due to use-after-free conditions. To fix this added check for request completion and skip processing if already completed and added the request status for ep0 while queue. Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver") Cc: stable <stable@kernel.org> Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Manish Nagar <manish.nagar@oss.qualcomm.com> Link: https://patch.msgid.link/20251120074435.1983091-1-manish.nagar@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * usb: uas: fix urb unmapping issue when the uas device is remove during ongoing data transfer When a UAS device is unplugged during data transfer, there is a probability of a system panic occurring. The root cause is an access to an invalid memory address during URB callback handling. Specifically, this happens when the dma_direct_unmap_sg() function is called within the usb_hcd_unmap_urb_for_dma() interface, but the sg->dma_address field is 0 and the sg data structure has already been freed. The SCSI driver sends transfer commands by invoking uas_queuecommand_lck() in uas.c, using the uas_submit_urbs() function to submit requests to USB. Within the uas_submit_urbs() implementation, three URBs (sense_urb, data_urb, and cmd_urb) are sequentially submitted. Device removal may occur at any point during uas_submit_urbs execution, which may result in URB submission failure. However, some URBs might have been successfully submitted before the failure, and uas_submit_urbs will return the -ENODEV error code in this case. The current error handling directly calls scsi_done(). In the SCSI driver, this eventually triggers scsi_complete() to invoke scsi_end_request() for releasing the sgtable. The successfully submitted URBs, when being unlinked to giveback, call usb_hcd_unmap_urb_for_dma() in hcd.c, leading to exceptions during sg unmapping operations since the sg data structure has already been freed. This patch modifies the error condition check in the uas_submit_urbs() function. When a UAS device is removed but one or more URBs have already been successfully submitted to USB, it avoids immediately invoking scsi_done() and save the cmnd to devinfo->cmnd array. If the successfully submitted URBs is completed before devinfo->resetting being set, then the scsi_done() function will be called within uas_try_complete() after all pending URB operations are finalized. Otherwise, the scsi_done() function will be called within uas_zap_pending(), which is executed after usb_kill_anchored_urbs(). The error handling only takes effect when uas_queuecommand_lck() calls uas_submit_urbs() and returns the error value -ENODEV . In this case, the device is disconnected, and the flow proceeds to uas_disconnect(), where uas_zap_pending() is invoked to call uas_try_complete(). Fixes: eb2a86ae8c54 ("USB: UAS: fix disconnect by unplugging a hub") Cc: stable <stable@kernel.org> Signed-off-by: Yu Chen <chenyu45@xiaomi.com> Signed-off-by: Owen Gu <guhuinan@xiaomi.com> Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20251120123336.3328-1-guhuinan@xiaomi.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * spi: spi-fsl-lpspi: fix watermark truncation caused by type cast 't->len' is an unsigned integer, while 'watermark' and 'txfifosize' are u8. Using min_t with typ…

* tick/sched: Fix bogus condition in report_idle_softirq() In commit 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the new function report_idle_softirq() was created by breaking code out of the existing can_stop_idle_tick() for kernels v5.18 and newer. In doing so, the code essentially went from this form: if (A) { static int ratelimit; if (ratelimit < 10 && !C && A&D) { pr_warn("NOHZ tick-stop error: ..."); ratelimit++; } return false; } to a new function: static bool report_idle_softirq(void) { static int ratelimit; if (likely(!A)) return false; if (ratelimit < 10) return false; ... pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n", pending); ratelimit++; return true; } commit a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition") realized ratelimit was essentially set to zero instead of ten, and hence *no* softirq pending messages would ever be issued, but "fixed" it as: - if (ratelimit < 10) + if (ratelimit >= 10) return false; However, this fix introduced another issue: When ratelimit is greater than or equal 10, even if A is true, it will directly return false. While ratelimit in the original code was only used to control printing and will not affect the return value. Restore the original logic and restrict ratelimit to control the printk and not the return value. Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") Fixes: a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition") Signed-off-by: Wen Yang <wen.yang@linux.dev> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251119174525.29470-1-wen.yang@linux.dev * drm/amd: Skip power ungate during suspend for VPE During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a6c826cfeedd7714611ac115371a959ead55bda) Cc: stable@vger.kernel.org * drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled [Why] Accoreding to CP updated to RS64 on gfx11, WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW. That packet is used for MCBP on F32 based system. So it would lead to incorrect GRBM write and FW is not handling that extra case correctly. [How] With gfx11 rs64 enabled, skip emit de meta data. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8366cd442d226463e673bed5d199df916f4ecbcf) Cc: stable@vger.kernel.org * drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag This fixes sparse mappings (aka. partially resident textures). Check the correct flags. Since a recent refactor, the code works with uAPI flags (for mapping buffer objects), and not PTE (page table entry) flags. Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8feeab26c80635b802f72b3ed986c693ff8f3212) * drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like normal TT/doorbell/preempt memory and unconditionally accessed ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a NULL pointer dereference when computing PDE flags. Fix this by checking that ttm is non-NULL before reading ttm->caching. This prevents the crash for MMIO_REMAP and also makes the code more defensive if other BOs ever come through without a ttm_tt. Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement") Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0db94da5a0a1cacda080b9ec8425fcbe4babc141) * drm/amdgpu: Add sriov vf check for VCN per queue reset support. Add SRIOV check when setting VCN ring's supported reset mask. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ee9b603ad43f9870eb75184f9fb0a84f8c3cc852) Cc: stable@vger.kernel.org * spi: cadence-quadspi: Fix cqspi_probe() error handling for runtime pm Commit f1eb4e792bb1 ("spi: spi-cadence-quadspi: Enable pm runtime earlier to avoid imbalance") relocated code but missed updating the error handling path associated with it. Prior to the relocation, runtime pm was enabled after the code-block associated with 'cqspi_request_mmap_dma()', due to which, the error handling for the same didn't require invoking 'pm_runtime_disable()'. Post refactoring, runtime pm has been enabled before the code-block and when an error is encountered, jumping to 'probe_dma_failed' doesn't invoke 'pm_runtime_disable()'. This leads to a race condition wherein 'cqspi_runtime_suspend()' is invoked while the error handling path executes in parallel. The resulting error is the following: clk:103:0 already disabled WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0x80/0xa0, CPU#1: kworker/u8:0/12 [TRIMMED] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : clk_core_disable+0x80/0xa0 lr : clk_core_disable+0x80/0xa0 [TRIMMED] Call trace: clk_core_disable+0x80/0xa0 (P) clk_core_disable_lock+0x88/0x10c clk_disable+0x24/0x30 cqspi_probe+0xa3c/0xae8 [TRIMMED] The error is due to the second invocation of 'clk_disable_unprepare()' on 'cqspi->clk' in the error handling within 'cqspi_probe()', with the first invocation being within 'cqspi_runtime_suspend()'. Fix this by correcting the error handling. Fixes: f1eb4e792bb1 ("spi: spi-cadence-quadspi: Enable pm runtime earlier to avoid imbalance") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20251119152545.2591651-1-s-vadapalli@ti.com Signed-off-by: Mark Brown <broonie@kernel.org> * wifi: rtw89: hw_scan: Don't let the operating channel be last Scanning can be offloaded to the firmware. To that end, the driver prepares a list of channels to scan, including periodic visits back to the operating channel, and sends the list to the firmware. When the channel list is too long to fit in a single H2C message, the driver splits the list, sends the first part, and tells the firmware to scan. When the scan is complete, the driver sends the next part of the list and tells the firmware to scan. When the last channel that fit in the H2C message is the operating channel something seems to go wrong in the firmware. It will acknowledge receiving the list of channels but apparently it will not do anything more. The AP can't be pinged anymore. The driver still receives beacons, though. One way to avoid this is to split the list of channels before the operating channel. Affected devices: * RTL8851BU with firmware 0.29.41.3 * RTL8832BU with firmware 0.29.29.8 * RTL8852BE with firmware 0.29.29.8 The commit 57a5fbe39a18 ("wifi: rtw89: refactor flow that hw scan handles channel list") is found by git blame, but it is actually to refine the scan flow, but not a culprit, so skip Fixes tag. Reported-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Closes: https://lore.kernel.org/linux-wireless/0abbda91-c5c2-4007-84c8-215679e652e1@gmail.com/ Cc: stable@vger.kernel.org # 6.16+ Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/c1e61744-8db4-4646-867f-241b47d30386@gmail.com * scsi: sg: Do not sleep in atomic context sg_finish_rem_req() calls blk_rq_unmap_user(). The latter function may sleep. Hence, call sg_finish_rem_req() with interrupts enabled instead of disabled. Reported-by: syzbot+c01f8e6e73f20459912e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-scsi/691560c4.a70a0220.3124cb.001a.GAE@google.com/ Cc: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org Fixes: 97d27b0dd015 ("scsi: sg: close race condition in sg_remove_sfp_usercontext()") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251113181643.1108973-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> * mptcp: fix ack generation for fallback msk mptcp_cleanup_rbuf() needs to know the last most recent, mptcp-level rcv_wnd sent, and such information is tracked into the msk->old_wspace field, updated at ack transmission time by mptcp_write_options(). Fallback socket do not add any mptcp options, such helper is never invoked, and msk->old_wspace value remain stale. That in turn makes ack generation at recvmsg() time quite random. Address the issue ensuring mptcp_write_options() is invoked even for fallback sockets, and just update the needed info in such a case. The issue went unnoticed for a long time, as mptcp currently overshots the fallback socket receive buffer autotune significantly. It is going to change in the near future. Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-1-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: avoid unneeded subflow-level drops The rcv window is shared among all the subflows. Currently, MPTCP sync the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time. The above means that incoming data may sporadically observe outdated TCP-level rcv window and being wrongly dropped by TCP. Address the issue checking for the edge condition before queuing the data at TCP level, and eventually syncing the rcv window as needed. Note that the issue is actually present from the very first MPTCP implementation, but backports older than the blamed commit below will range from impossible to useless. Before: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 14 0.0 After: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 0 0.0 Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-2-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix premature close in case of fallback I'm observing very frequent self-tests failures in case of fallback when running on a CONFIG_PREEMPT kernel. The root cause is that subflow_sched_work_if_closed() closes any subflow as soon as it is half-closed and has no incoming data pending. That works well for regular subflows - MPTCP needs bi-directional connectivity to operate on a given subflow - but for fallback socket is race prone. When TCP peer closes the connection before the MPTCP one, subflow_sched_work_if_closed() will schedule the MPTCP worker to gracefully close the subflow, and shortly after will do another schedule to inject and process a dummy incoming DATA_FIN. On CONFIG_PREEMPT kernel, the MPTCP worker can kick-in and close the fallback subflow before subflow_sched_work_if_closed() is able to create the dummy DATA_FIN, unexpectedly interrupting the transfer. Address the issue explicitly avoiding closing fallback subflows on when the peer is only half-closed. Note that, when the subflow is able to create the DATA_FIN before the worker invocation, the worker will change the msk state before trying to close the subflow and will skip the latter operation as the msk will not match anymore the precondition in __mptcp_close_subflow(). Fixes: f09b0ad55a11 ("mptcp: close subflow when receiving TCP+FIN") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-3-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: do not fallback when OoO is present In case of DSS corruption, the MPTCP protocol tries to avoid the subflow reset if fallback is possible. Such corruptions happen in the receive path; to ensure fallback is possible the stack additionally needs to check for OoO data, otherwise the fallback will break the data stream. Fixes: e32d262c89e2 ("mptcp: handle consistently DSS corruption") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: decouple mptcp fastclose from tcp close With the current fastclose implementation, the mptcp_do_fastclose() helper is in charge of two distinct actions: send the fastclose reset and cleanup the subflows. Formally decouple the two steps, ensuring that mptcp explicitly closes all the subflows after the mentioned helper. This will make the upcoming fix simpler, and allows dropping the 2nd argument from mptcp_destroy_common(). The Fixes tag is then the same as in the next commit to help with the backports. Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-5-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix duplicate reset on fastclose The CI reports sporadic failures of the fastclose self-tests. The root cause is a duplicate reset, not carrying the relevant MPTCP option. In the failing scenario the bad reset is received by the peer before the fastclose one, preventing the reception of the latter. Indeed there is window of opportunity at fastclose time for the following race: mptcp_do_fastclose __mptcp_close_ssk __tcp_close() tcp_set_state() [1] tcp_send_active_reset() [2] After [1] the stack will send reset to in-flight data reaching the now closed port. Such reset may race with [2]. Address the issue explicitly sending a single reset on fastclose before explicitly moving the subflow to close status. Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/596 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-6-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: fastclose: remove flaky marks After recent fixes like the parent commit, and "selftests: mptcp: connect: trunc: read all recv data", the two fastclose subtests no longer look flaky any more. It then feels fine to remove these flaky marks, to no longer ignore these subtests in case of errors. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-7-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: endpoints: longer timeout In rare cases, when the test environment is very slow, some endpoints tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to have a longer timeout, and even go over the default one. This connection will be killed at the end, after the verifications: increasing the timeout doesn't change anything, apart from avoiding it to end before the end of the verifications. To play it safe, all endpoints tests not waiting for the end of the transfer are now having a longer timeout: 2 minutes. The Fixes commit was making the connection longer, but still, the default timeout would have stopped it after 1 minute, which might not be enough in very slow environments. Fixes: 6457595db987 ("selftests: mptcp: join: endpoints: longer transfer") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: join: userspace: longer timeout In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to have a longer timeout, and even go over the default one. This connection will be killed at the end, after the verifications: increasing the timeout doesn't change anything, apart from avoiding it to end before the end of the verifications. To play it safe, all userspace tests not waiting for the end of the transfer are now having a longer timeout: 2 minutes. The Fixes commit was making the connection longer, but still, the default timeout would have stopped it after 1 minute, which might not be enough in very slow environments. Fixes: 290493078b96 ("selftests: mptcp: join: userspace: longer transfer") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * mptcp: fix address removal logic in mptcp_pm_nl_rm_addr Fix inverted WARN_ON_ONCE condition that prevented normal address removal counter updates. The current code only executes decrement logic when the counter is already 0 (abnormal state), while normal removals (counter > 0) are ignored. Signed-off-by: Gang Yan <yangang@kylinos.cn> Fixes: 636113918508 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_rm_addr_received") Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-10-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * selftests: mptcp: add a check for 'add_addr_accepted' The previous patch fixed an issue with the 'add_addr_accepted' counter. This was not spot by the test suite. Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add signal' test. This should help spotting similar regressions later on. These counters are crucial for ensuring the MPTCP path manager correctly handles the subflow creation via 'ADD_ADDR'. Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> * LoongArch: Use UAPI types in ptrace UAPI header The kernel UAPI headers already contain fixed-width integer types, there is no need to rely on the libc types. There may not be a libc available or the libc may not provides the <stdint.h>, like for example on nolibc. This also aligns the header with the rest of the LoongArch UAPI headers. Fixes: 803b0fc5c3f2 ("LoongArch: Add process management") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Consolidate CPU names in /proc/cpuinfo Some processors have no IOCSR.VENDOR and IOCSR.CPUNAME, some processors have these registers but there is no valid information. Consolidate CPU names in /proc/cpuinfo: 1. Add "PRID" to display the PRID & Core-Name; 2. Let "Model Name" display "Unknown" if no valid name. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Fix NUMA node parsing with numa_memblks On physical machine, NUMA node id comes from high bit 44:48 of physical address. However it is not true on virt machine. With general method, it comes from ACPI SRAT table. Here the common function numa_memblks_init() is used to parse NUMA node information with numa_memblks. Cc: <stable@vger.kernel.org> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Mask all interrupts during kexec/kdump If the default state of the interrupt controllers in the first kernel don't mask any interrupts, it may cause the second kernel to potentially receive interrupts (which were previously allocated by the first kernel) immediately after a CPU becomes online during its boot process. These interrupts cannot be properly routed, leading to bad IRQ issues. This patch calls machine_kexec_mask_interrupts() to mask all interrupts during the kexec/kdump process. Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: Don't panic if no valid cache info for PCI If there is no valid cache info detected (may happen in virtual machine) for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI core it will be evaluated to (L1_CACHE_BYTES >> 2). Cc: <stable@vger.kernel.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * LoongArch: BPF: Disable trampoline for kernel module function trace The current LoongArch BPF trampoline implementation is incompatible with tracing functions in kernel modules. This causes several severe and user-visible problems: * The `bpf_selftests/module_attach` test fails consistently. * Kernel lockup when a BPF program is attached to a module function [1]. * Critical kernel modules like WireGuard experience traffic disruption when their functions are traced with fentry [2]. Given the severity and the potential for other unknown side-effects, it is safest to disable the feature entirely for now. This patch prevents the BPF subsystem from allowing trampoline attachments to kernel module functions on LoongArch. This is a temporary mitigation until the core issues in the trampoline code for kernel module handling can be identified and fixed. [root@fedora bpf]# ./test_progs -a module_attach -v bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_module_attach:PASS:skel_open 0 nsec test_module_attach:PASS:set_attach_target 0 nsec test_module_attach:PASS:set_attach_target_explicit 0 nsec test_module_attach:PASS:skel_load 0 nsec libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP test_module_attach:FAIL:skel_attach skeleton attach failed: -524 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Successfully unloaded bpf_testmod.ko. [1]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C13FMT58oqQ@mail.gmail.com/ [2]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcFaK93uoELPg@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support") Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> * smb: client: introduce close_cached_dir_locked() Replace close_cached_dir() calls under cfid_list_lock with a new close_cached_dir_locked() variant that uses kref_put() instead of kref_put_lock() to avoid recursive locking when dropping references. While the existing code works if the refcount >= 2 invariant holds, this area has proven error-prone. Make deadlocks impossible and WARN on invariant violations. Cc: stable@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> * cifs: fix memory leak in smb3_fs_context_parse_param error path Add proper cleanup of ctx->source and fc->source to the cifs_parse_mount_err error handler. This ensures that memory allocated for the source strings is correctly freed on all error paths, matching the cleanup already performed in the success path by smb3_cleanup_fs_context_contents(). Pointers are also set to NULL after freeing to prevent potential double-free issues. This change fixes a memory leak originally detected by syzbot. The leak occurred when processing Opt_source mount options if an error happened after ctx->source and fc->source were successfully allocated but before the function completed. The specific leak sequence was: 1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory 2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory 3. A subsequent error jumps to cifs_parse_mount_err 4. The old error handler freed passwords but not the source strings, causing the memory to leak. This issue was not addressed by commit e8c73eb7db0a ("cifs: client: fix memory leak in smb3_fs_context_parse_param"), which only fixed leaks from repeated fsconfig() calls but not this error path. Patch updated with minor change suggested by kernel test robot Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3 Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Steve French <stfrench@microsoft.com> * cifs: Add the smb3_read_* tracepoints to SMB1 Add the smb3_read_* tracepoints to SMB1's cifs_async_readv() and cifs_readv_callback(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> * perf: Fix 0 count issue of cpu-clock Currently cpu-clock event always returns 0 count, e.g., perf stat -e cpu-clock -- sleep 1 Performance counter stats for 'sleep 1': 0 cpu-clock # 0.000 CPUs utilized 1.002308394 seconds time elapsed The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle error of some clock events")' adds PERF_EF_UPDATE flag check before calling cpu_clock_event_update() to update the count, however the PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in counting mode (pmu->dev() -> cpu_clock_event_del() -> cpu_clock_event_stop()). This leads to the cpu-clock event count is never updated. To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event just like what task-clock does. Fixes: bc4394e5e79c ("perf: Fix the throttle error of some clock events") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com * xfs: fix out of bounds memory read error in symlink repair xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> * net: dsa: microchip: lan937x: Fix RGMII delay tuning Correct RGMII delay application logic in lan937x_set_tune_adj(). The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the new delay value. This caused the new value to be bitwise-OR'd with the existing PORT_TUNE_ADJ field instead of replacing it. For example, when setting the RGMII 2 TX delay on port 4, the intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was incorrectly OR'd with the default 0x1B (from register value 0xDA3), leaving the delay at the wrong setting. This patch adds the missing mask to clear the field, ensuring the correct delay value is written. Physical measurements on the RGMII TX lines confirm the fix, showing the delay changing from ~1ns (before change) to ~2ns. While testing on i.MX 8MP showed this was within the platform's timing tolerance, it did not match the intended hardware-characterized value. Fixes: b19ac41faa3f ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> * l2tp: reset skb control buffer on xmit The L2TP stack did not reset the skb control buffer before sending the encapsulated package. In a setup with an ath10k radio and batman-adv over an L2TP tunnel massive fragmentations happen sporadically if the L2TP tunnel is established over IPv4. L2TP might reset some of the fields in the IP control buffer, but L2TP assumes the type of the control buffer to be of an IPv4 packet. In case the L2TP interface is used as a batadv hardif or the packet is an IPv6 packet, this assumption breaks. Clear the entire control buffer to avoid such mishaps altogether. Fixes: f77ae9390438 ("[PPPOL2TP]: Reset meta-data in xmit function") Signed-off-by: David Bauer <mail@david-bauer.net> Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> * ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan() Call scsi_device_put() in ata_scsi_dev_rescan() if the device or its queue are not running. Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Yihang Li <liyihang9@h-partners.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * ata: libata-scsi: Fix system suspend for a security locked drive Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") fixed ata_to_sense_error() to properly generate sense key ABORTED COMMAND (without any additional sense code), instead of the previous bogus sense key ILLEGAL REQUEST with the additional sense code UNALIGNED WRITE COMMAND, for a failed command. However, this broke suspend for Security locked drives (drives that have Security enabled, and have not been Security unlocked by boot firmware). The reason for this is that the SCSI disk driver, for the Synchronize Cache command only, treats any sense data with sense key ILLEGAL REQUEST as a successful command (regardless of ASC / ASCQ). After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") the code that treats any sense data with sense key ILLEGAL REQUEST as a successful command is no longer applicable, so the command fails, which causes the system suspend to be aborted: sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5 sd 1:0:0:0: PM: failed to suspend async: error -5 PM: Some devices failed to suspend, or early wake event detected To make suspend work once again, for a Security locked device only, return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense data which a real SCSI device would have returned if locked. The SCSI disk driver treats this sense data as a successful command. Cc: stable@vger.kernel.org Reported-by: Ilia Baryshnikov <qwelias@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704 Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * ata: libata-core: Set capacity to zero for a security locked drive For Security locked drives (drives that have Security enabled, and have not been Security unlocked by boot firmware), the automatic partition scanning will result in the user being spammed with errors such as: ata5.00: failed command: READ DMA ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error) ata5.00: status: { DRDY ERR } ata5.00: error: { ABRT } sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current] sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information during boot, because most commands except for IDENTIFY will be aborted by a Security locked drive. For a Security locked drive, set capacity to zero, so that no automatic partition scanning will happen. If the user later unlocks the drive using e.g. hdparm, the close() by the user space application should trigger a revalidation of the drive. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> * be2net: pass wrb_params in case of OS2BMC be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL pointer when processing a workaround for specific packet, as commit bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6 packet") states. The correct way would be to pass the wrb_params from be_xmit(). Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.") Cc: stable@vger.kernel.org Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> * vsock: Ignore signal/timeout on connect() if already established During connect(), acting on a signal/timeout by disconnecting an already established socket leads to several issues: 1. connect() invoking vsock_transport_cancel_pkt() -> virtio_transport_purge_skbs() may race with sendmsg() invoking virtio_transport_get_credit(). This results in a permanently elevated `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling. 2. connect() resetting a connected socket's state may race with socket being placed in a sockmap. A disconnected socket remaining in a sockmap breaks sockmap's assumptions. And gives rise to WARNs. 3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a transport change/drop after TCP_ESTABLISHED. Which poses a problem for any simultaneous sendmsg() or connect() and may result in a use-after-free/null-ptr-deref. Do not disconnect socket on signal/timeout. Keep the logic for unconnected sockets: they don't linger, can't be placed in a sockmap, are rejected by sendmsg(). [1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/ [2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/ [3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/ Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org> * timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths tk_aux_sysfs_init() returns immediately on error during the auxiliary clock initialization loop without cleaning up previously allocated kobjects and sysfs groups. If kobject_create_and_add() or sysfs_create_group() fails during loop iteration, the parent kobjects (tko and auxo) and any previously created child kobjects are leaked. Fix this by adding proper error handling with goto labels to ensure all allocated resources are cleaned up on failure. kobject_put() on the parent kobjects will handle cleanup of their children. Fixes: 7b95663a3d96 ("timekeeping: Provide interface to control auxiliary clocks") Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120150213.246777-1-mrout@redhat.com * io_uring/cmd_net: fix wrong argument types for skb_queue_splice() If timestamp retriving needs to be retried and the local list of SKB's already has entries, then it's spliced back into the socket queue. However, the arguments for the splice helper are transposed, causing exactly the wrong direction of splicing into the on-stack list. Fix that up. Cc: stable@vger.kernel.org Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com> Fixes: 9e4ed359b8ef ("io_uring/netcmd: add tx timestamping cmd support") Signed-off-by: Jens Axboe <axboe@kernel.dk> * sched_ext: Fix scx_enable() crash on helper kthread creation failure A crash was observed when the sched_ext selftests runner was terminated with Ctrl+\ while test 15 was running: NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0 LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0 Call Trace: scx_enable.constprop.0+0x32c/0x12b0 (unreliable) bpf_struct_ops_link_create+0x18c/0x22c __sys_bpf+0x23f8/0x3044 sys_bpf+0x2c/0x6c system_call_exception+0x124/0x320 system_call_vectored_common+0x15c/0x2ec kthread_run_worker() returns an ERR_PTR() on failure rather than NULL, but the current code in scx_alloc_and_add_sched() only checks for a NULL helper. Incase of failure on SIGQUIT, the error is not handled in scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an error pointer. Error handling is fixed in scx_alloc_and_add_sched() to propagate PTR_ERR() into ret, so that scx_enable() jumps to the existing error path, avoiding random dereference on failure. Fixes: bff3b5aec1b7 ("sched_ext: Move disable machinery into scx_sched") Cc: stable@vger.kernel.org # v6.16+ Reported-and-tested-by: Samir Mulani <samir@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org> * dm: fix failure when empty flush's bi_sector points beyond the device end An empty flush bio can have arbitrary bi_sector. The commit 2b1c6d7a890a introduced a regression that device mapper would fail an empty flush bio with -EIO if the sector pointed beyond the end of the device. The commit introduced an optimization, that optimization would pass flushes to __split_and_process_bio and __split_and_process_bio is not prepared to handle empty bios. Fix this bug by passing only non-empty flushes to __split_and_process_bio - non-empty flushes must have valid bi_sector. Empty bios will go through __send_empty_flush, as they did before the optimization. This problem can be reproduced by running the lvm2 test: make check_local T=lvconvert-thin.sh LVM_TEST_PREFER_BRD=0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 2b1c6d7a890a ("dm: optimize REQ_PREFLUSH with data when using the linear target") Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> * selinux: rename task_security_struct to cred_security_struct Before Linux had cred structures, the SELinux task_security_struct was per-task and although the structure was switched to being per-cred long ago, the name was never updated. This change renames it to cred_security_struct to avoid confusion and pave the way for the introduction of an actual per-task security structure for SELinux. No functional change. Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> * selinux: move avdcache to per-task security struct The avdcache is meant to be per-task; move it to a new task_security_struct that is duplicated per-task. Cc: stable@vger.kernel.org Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: line length fixes] Signed-off-by: Paul Moore <paul@paul-moore.com> * selinux: rename the cred_security_struct variables to "crsec" Along with the renaming from task_security_struct to cred_security_struct, rename the local variables to "crsec" from "tsec". This both fits with existing conventions and helps distinguish between task and cred related variables. No functional changes. Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> * Bluetooth: btusb: mediatek: Fix kernel crash when releasing mtk iso interface When performing reset tests and encountering abnormal card drop issues that lead to a kernel crash, it is necessary to perform a null check before releasing resources to avoid attempting to release a null pointer. <4>[ 29.158070] Hardware name: Google Quigon sku196612/196613 board (DT) <4>[ 29.158076] Workqueue: hci0 hci_cmd_sync_work [bluetooth] <4>[ 29.158154] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 29.158162] pc : klist_remove+0x90/0x158 <4>[ 29.158174] lr : klist_remove+0x88/0x158 <4>[ 29.158180] sp : ffffffc0846b3c00 <4>[ 29.158185] pmr_save: 000000e0 <4>[ 29.158188] x29: ffffffc0846b3c30 x28: ffffff80cd31f880 x27: ffffff80c1bdc058 <4>[ 29.158199] x26: dead000000000100 x25: ffffffdbdc624ea3 x24: ffffff80c1bdc4c0 <4>[ 29.158209] x23: ffffffdbdc62a3e6 x22: ffffff80c6c07000 x21: ffffffdbdc829290 <4>[ 29.158219] x20: 0000000000000000 x19: ffffff80cd3e0648 x18: 000000031ec97781 <4>[ 29.158229] x17: ffffff80c1bdc4a8 x16: ffffffdc10576548 x15: ffffff80c1180428 <4>[ 29.158238] x14: 0000000000000000 x13: 000000000000e380 x12: 0000000000000018 <4>[ 29.158248] x11: ffffff80c2a7fd10 x10: 0000000000000000 x9 : 0000000100000000 <4>[ 29.158257] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 2d7223ff6364626d <4>[ 29.158266] x5 : 0000008000000000 x4 : 0000000000000020 x3 : 2e7325006465636e <4>[ 29.158275] x2 : ffffffdc11afeff8 x1 : 0000000000000000 x0 : ffffffdc11be4d0c <4>[ 29.158285] Call trace: <4>[ 29.158290] klist_remove+0x90/0x158 <4>[ 29.158298] device_release_driver_internal+0x20c/0x268 <4>[ 29.158308] device_release_driver+0x1c/0x30 <4>[ 29.158316] usb_driver_release_interface+0x70/0x88 <4>[ 29.158325] btusb_mtk_release_iso_intf+0x68/0xd8 [btusb (HASH:e8b6 5)] <4>[ 29.158347] btusb_mtk_reset+0x5c/0x480 [btusb (HASH:e8b6 5)] <4>[ 29.158361] hci_cmd_sync_work+0x10c/0x188 [bluetooth (HASH:a4fa 6)] <4>[ 29.158430] process_scheduled_works+0x258/0x4e8 <4>[ 29.158441] worker_thread+0x300/0x428 <4>[ 29.158448] kthread+0x108/0x1d0 <4>[ 29.158455] ret_from_fork+0x10/0x20 <0>[ 29.158467] Code: 91343000 940139d1 f9400268 927ff914 (f9401297) <4>[ 29.158474] ---[ end trace 0000000000000000 ]--- <0>[ 29.167129] Kernel panic - not syncing: Oops: Fatal exception <2>[ 29.167144] SMP: stopping secondary CPUs <4>[ 29.167158] ------------[ cut here ]------------ Fixes: ceac1cb0259d ("Bluetooth: btusb: mediatek: add ISO data transmission functions") Signed-off-by: Chris Lu <chris.lu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_core: Fix triggering cmd_timer for HCI_OP_NOP HCI_OP_NOP means no command was actually sent so there is no point in triggering cmd_timer which may cause a hdev->reset in the process since it is assumed that the controller is stuck processing a command. Fixes: e2d471b7806b ("Bluetooth: ISO: Fix not using SID from adv report") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_sock: Prevent race in socket write iter and sock bind There is a potential race condition between sock bind and socket write iter. bind may free the same cmd via mgmt_pending before write iter sends the cmd, just as syzbot reported in UAF[1]. Here we use hci_dev_lock to synchronize the two, thereby avoiding the UAF mentioned in [1]. [1] syzbot reported: BUG: KASAN: slab-use-after-free in mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316 Read of size 8 at addr ffff888077164818 by task syz.0.17/5989 Call Trace: mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316 set_link_security+0x5c2/0x710 net/bluetooth/mgmt.c:1918 hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719 hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 sock_write_iter+0x279/0x360 net/socket.c:1195 Allocated by task 5989: mgmt_pending_add+0x35/0x140 net/bluetooth/mgmt_util.c:296 set_link_security+0x557/0x710 net/bluetooth/mgmt.c:1910 hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719 hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 sock_write_iter+0x279/0x360 net/socket.c:1195 Freed by task 5991: mgmt_pending_free net/bluetooth/mgmt_util.c:311 [inline] mgmt_pending_foreach+0x30d/0x380 net/bluetooth/mgmt_util.c:257 mgmt_index_removed+0x112/0x2f0 net/bluetooth/mgmt.c:9477 hci_sock_bind+0xbe9/0x1000 net/bluetooth/hci_sock.c:1314 Fixes: 6fe26f694c82 ("Bluetooth: MGMT: Protect mgmt_pending list with its own lock") Reported-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9aa47cd4633a3cf92a80 Tested-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: hci_core: lookup hci_conn on RX path on protocol side The hdev lock/lookup/unlock/use pattern in the packet RX path doesn't ensure hci_conn* is not concurrently modified/deleted. This locking appears to be leftover from before conn_hash started using RCU commit bf4c63252490b ("Bluetooth: convert conn hash to RCU") and not clear if it had purpose since then. Currently, there are code paths that delete hci_conn* from elsewhere than the ordered hdev->workqueue where the RX work runs in. E.g. commit 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") introduced some of these, and there probably were a few others before it. It's better to do the locking so that even if these run concurrently no UAF is possible. Move the lookup of hci_conn and associated socket-specific conn to protocol recv handlers, and do them within a single critical section to cover hci_conn* usage and lookup. syzkaller has reported a crash that appears to be this issue: [Task hdev->workqueue] [Task 2] hci_disconnect_all_sync l2cap_recv_acldata(hcon) hci_conn_get(hcon) hci_abort_conn_sync(hcon) hci_dev_lock hci_dev_lock hci_conn_del(hcon) v-------------------------------- hci_dev_unlock hci_conn_put(hcon) conn = hcon->l2cap_data (UAF) Fixes: 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync") Reported-by: syzbot+d32d77220b92eddd89ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d32d77220b92eddd89ad Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: btusb: mediatek: Avoid btusb_mtk_claim_iso_intf() NULL deref In btusb_mtk_setup(), we set `btmtk_data->isopkt_intf` to: usb_ifnum_to_if(data->udev, MTK_ISO_IFNUM) That function can return NULL in some cases. Even when it returns NULL, though, we still go on to call btusb_mtk_claim_iso_intf(). As of commit e9087e828827 ("Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()"), calling btusb_mtk_claim_iso_intf() when `btmtk_data->isopkt_intf` is NULL will cause a crash because we'll end up passing a bad pointer to device_lock(). Prior to that commit we'd pass the NULL pointer directly to usb_driver_claim_interface() which would detect it and return an error, which was handled. Resolve the crash in btusb_mtk_claim_iso_intf() by adding a NULL check at the start of the function. This makes the code handle a NULL `btmtk_data->isopkt_intf` the same way it did before the problematic commit (just with a slight change to the error message printed). Reported-by: IncogCyberpunk <incogcyberpunk@proton.me> Closes: http://lore.kernel.org/r/a380d061-479e-4713-bddd-1d6571ca7e86@leemhuis.info Fixes: e9087e828827 ("Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()") Cc: stable@vger.kernel.org Tested-by: IncogCyberpunk <incogcyberpunk@proton.me> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * Bluetooth: SMP: Fix not generating mackey and ltk when repairing The change eed467b517e8 ("Bluetooth: fix passkey uninitialized when used") introduced a goto that bypasses the creation of temporary mackey and ltk which are later used by the likes of DHKey Check step. Later ffee202a78c2 ("Bluetooth: Always request for user confirmation for Just Works (LE SC)") which means confirm_hint is always set in case JUST_WORKS so the branch checking for an existing LTK becomes pointless as confirm_hint will always be set, so this just merge both cases of malicious or legitimate devices to be confirmed before continuing with the pairing procedure. Link: https://github.com/bluez/bluez/issues/1622 Fixes: eed467b517e8 ("Bluetooth: fix passkey uninitialized when used") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> * net: atm: fix incorrect cleanup function call in error path In atm_init(), if atmsvc_init() fails, the code jumps to out_atmpvc_exit label which incorrectly calls atmsvc_exit() instead of atmpvc_exit(). This results in calling the wrong cleanup function and failing to properly clean up atmpvc_init(). Fix this by calling atmpvc_exit() in the out_atmpvc_exit error path. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sayooj K Karun <sayooj@aerlync.com> Link: https://patch.msgid.link/20251119085747.67139-1-sayooj@aerlync.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> * veth: reduce XDP no_direct return section to fix race As explain in commit fa349e396e48 ("veth: Fix race with AF_XDP exposing old or uninitialized descriptors") for veth there is a chance after napi_complete_done() that another CPU can manage start another NAPI instance running veth_pool(). For NAPI this is correctly handled as the napi_schedule_prep() check will prevent multiple instances from getting scheduled, but for the remaining code in veth_pool() this can run concurrent with the newly started NAPI instance. The problem/race is that xdp_clear_return_frame_no_direct() isn't designed to be nested. Prior to commit 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") the temporary BPF net context bpf_redirect_info was stored per CPU, where this wasn't an issue. Since this commit the BPF context is stored in 'current' task_struct. When running veth in threaded-NAPI mode, then the kthread becomes the storage area. Now a race exists between two concurrent veth_pool() function calls one exiting NAPI and one running new NAPI, both using the same BPF net context. Race is when another CPU gets within the xdp_set_return_frame_no_direct() section before exiting veth_pool() calls the clear-function xdp_clear_return_frame_no_direct(). Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/176356963888.337072.4805242001928705046.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org> * ALSA: hda/realtek: add quirk for HP pavilion aero laptop 13z-be200 The laptop uses ALC287 chip (as shown in /proc/asound/card1/codec#0). It seems that every HP pavilion laptop in the table uses the same quirk, so I just copied them. I have verified that the mute LED on my laptop works with this patch. For reference, here's the alsa-info of my laptop: https://alsa-project.org/db/?f=2d5f297087708610bc01816ab12052abdd4a17c0 Signed-off-by: Jacob Zhong <cmpute@qq.com> Link: https://patch.msgid.link/tencent_E2DFA33EFDF39E0517A94FA8FF06C05C0709@qq.com Signed-off-by: Takashi Iwai <tiwai@suse.de> * dm-verity: fix unreliable memory allocation GFP_NOWAIT allocation may fail anytime. It needs to be changed to GFP_NOIO. There's no need to handle an error because mempool_alloc with GFP_NOIO can't fail. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Biggers <ebiggers@kernel.org> * MIPS: mm: Prevent a TLB shutdown on initial uniquification Depending on the particular CPU implementation a TLB shutdown may occur if multiple matching entries are detected upon the execution of a TLBP or the TLBWI/TLBWR instructions. Given that we don't know what entries we have been handed we need to be very careful with the initial TLB setup and avoid all these instructions. Therefore read all the TLB entries one by one with the TLBR instruction, bypassing the content addressing logic, and truncate any large pages in place so as to avoid a case in the second step where an incoming entry for a large page at a lower address overlaps with a replacement entry chosen at another index. Then preinitialize the TLB using addresses outside our usual unique range and avoiding clashes with any entries received, before making the usual call to local_flush_tlb_all(). This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual address). Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init") Cc: stable@vger.kernel.org Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> * MIPS: kernel: Fix random segmentation faults Commit 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation") switches to a generic vdso storage, which increases the number of data pages from 1 to 4. But there is only one page reserved, which causes segementation faults depending where the VDSO area is randomized to. To fix this use the same size of reservation and allocation of the VDSO data pages. Fixes: 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation") Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> * usb: storage: sddr55: Reject out-of-bound new_pba Discovered by Atuin - Automated Vulnerability Discovery Engine. new_pba comes from the status packet returned after each write. A bogus device could report values beyond the block count derived from info->capacity, letting the driver walk off the end of pba_to_lba[] and corrupt heap memory. Reject PBAs that exceed the computed block count and fail the transfer so we avoid touching out-of-range mapping entries. Signed-off-by: Tianchu Chen <flynnnchen@tencent.com> Cc: stable <stable@kernel.org> Link: https://patch.msgid.link/B2DC73A3EE1E3A1D+202511161322001664687@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * xhci: dbgtty: fix device unregister When DbC is disconnected then xhci_dbc_tty_unregister_device() is called. However if there is any user space process blocked on write to DbC terminal device then it will never be signalled and thus stay blocked indifinitely. This fix adds a tty_vhangup() call in xhci_dbc_tty_unregister_device(). The tty_vhangup() wakes up any blocked writers and causes subsequent write attempts to DbC terminal device to fail. Cc: stable <stable@kernel.org> Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver") Signed-off-by: Łukasz Bartosik <ukaszb@chromium.org> Link: https://patch.msgid.link/20251119212910.1245694-1-ukaszb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * usb: dwc3: Fix race condition between concurrent dwc3_remove_requests() call paths This patch addresses a race condition caused by unsynchronized execution of multiple call paths invoking `dwc3_remove_requests()`, leading to premature freeing of USB requests and subsequent crashes. Three distinct execution paths interact with `dwc3_remove_requests()`: Path 1: Triggered via `dwc3_gadget_reset_interrupt()` during USB reset handling. The call stack includes: - `dwc3_ep0_reset_state()` - `dwc3_ep0_stall_and_restart()` - `dwc3_ep0_out_start()` - `dwc3_remove_requests()` - `dwc3_gadget_del_and_unmap_request()` Path 2: Also initiated from `dwc3_gadget_reset_interrupt()`, but through `dwc3_stop_active_transfers()`. The call stack includes: - `dwc3_stop_active_transfers()` - `dwc3_remove_requests()` - `dwc3_gadget_del_and_unmap_request()` Path 3: Occurs independently during `adb root` execution, which triggers USB function unbind and bind operations. The sequence includes: - `gserial_disconnect()` - `usb_ep_disable()` - `dwc3_gadget_ep_disable()` - `dwc3_remove_requests()` with `-ESHUTDOWN` status Path 3 operates asynchronously and lacks synchronization with Paths 1 and 2. When Path 3 completes, it disables endpoints and frees 'out' requests. If Paths 1 or 2 are still processing these requests, accessing freed memory leads to a crash due to use-after-free conditions. To fix this added check for request completion and skip processing if already completed and added the request status for ep0 while queue. Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver") Cc: stable <stable@kernel.org> Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Manish Nagar <manish.nagar@oss.qualcomm.com> Link: https://patch.msgid.link/20251120074435.1983091-1-manish.nagar@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * usb: uas: fix urb unmapping issue when the uas device is remove during ongoing data transfer When a UAS device is unplugged during data transfer, there is a probability of a system panic occurring. The root cause is an access to an invalid memory address during URB callback handling. Specifically, this happens when the dma_direct_unmap_sg() function is called within the usb_hcd_unmap_urb_for_dma() interface, but the sg->dma_address field is 0 and the sg data structure has already been freed. The SCSI driver sends transfer commands by invoking uas_queuecommand_lck() in uas.c, using the uas_submit_urbs() function to submit requests to USB. Within the uas_submit_urbs() implementation, three URBs (sense_urb, data_urb, and cmd_urb) are sequentially submitted. Device removal may occur at any point during uas_submit_urbs execution, which may result in URB submission failure. However, some URBs might have been successfully submitted before the failure, and uas_submit_urbs will return the -ENODEV error code in this case. The current error handling directly calls scsi_done(). In the SCSI driver, this eventually triggers scsi_complete() to invoke scsi_end_request() for releasing the sgtable. The successfully submitted URBs, when being unlinked to giveback, call usb_hcd_unmap_urb_for_dma() in hcd.c, leading to exceptions during sg unmapping operations since the sg data structure has already been freed. This patch modifies the error condition check in the uas_submit_urbs() function. When a UAS device is removed but one or more URBs have already been successfully submitted to USB, it avoids immediately invoking scsi_done() and save the cmnd to devinfo->cmnd array. If the successfully submitted URBs is completed before devinfo->resetting being set, then the scsi_done() function will be called within uas_try_complete() after all pending URB operations are finalized. Otherwise, the scsi_done() function will be called within uas_zap_pending(), which is executed after usb_kill_anchored_urbs(). The error handling only takes effect when uas_queuecommand_lck() calls uas_submit_urbs() and returns the error value -ENODEV . In this case, the device is disconnected, and the flow proceeds to uas_disconnect(), where uas_zap_pending() is invoked to call uas_try_complete(). Fixes: eb2a86ae8c54 ("USB: UAS: fix disconnect by unplugging a hub") Cc: stable <stable@kernel.org> Signed-off-by: Yu Chen <chenyu45@xiaomi.com> Signed-off-by: Owen Gu <guhuinan@xiaomi.com> Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20251120123336.3328-1-guhuinan@xiaomi.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * spi: spi-fsl-lpspi: fix watermark truncation caused by type cast 't->len' is an unsigned integer, while 'watermark' and 'txfifosize' are u8. Using min_t with typeof(watermark) forces bo…

xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-torvalds#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-torvalds#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>

[ Upstream commit 678e1cc ] xfs/286 produced this report on my test fleet: ================================================================== BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110 Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-torvalds#184): memcpy_orig+0x54/0x110 xrep_symlink_salvage_inline+0xb3/0xf0 [xfs] xrep_symlink_salvage+0x100/0x110 [xfs] xrep_symlink+0x2e/0x80 [xfs] xrep_attempt+0x61/0x1f0 [xfs] xfs_scrub_metadata+0x34f/0x5c0 [xfs] xfs_ioc_scrubv_metadata+0x387/0x560 [xfs] xfs_file_ioctl+0xe23/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 kfence-torvalds#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128 allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago): xfs_init_local_fork+0x79/0xe0 [xfs] xfs_iformat_local+0xa4/0x170 [xfs] xfs_iformat_data_fork+0x148/0x180 [xfs] xfs_inode_from_disk+0x2cd/0x480 [xfs] xfs_iget+0x450/0xd60 [xfs] xfs_bulkstat_one_int+0x6b/0x510 [xfs] xfs_bulkstat_iwalk+0x1e/0x30 [xfs] xfs_iwalk_ag_recs+0xdf/0x150 [xfs] xfs_iwalk_run_callbacks+0xb9/0x190 [xfs] xfs_iwalk_ag+0x1dc/0x2f0 [xfs] xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs] xfs_iwalk+0xa4/0xd0 [xfs] xfs_bulkstat+0xfa/0x170 [xfs] xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs] xfs_file_ioctl+0xbf2/0x10e0 [xfs] __x64_sys_ioctl+0x76/0xc0 do_syscall_64+0x4e/0x1e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014 ================================================================== On further analysis, I realized that the second parameter to min() is not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data buffer. if_bytes can be smaller than the data fork size because: (a) the forkoff code tries to keep the data area as large as possible (b) for symbolic links, if_bytes is the ondisk file size + 1 (c) forkoff is always a multiple of 8. Case in point: for a single-byte symlink target, forkoff will be 8 but the buffer will only be 2 bytes long. In other words, the logic here is wrong and we walk off the end of the incore buffer. Fix that. Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Update HOWTO

02acfc4

laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017

Merge pull request torvalds#184 from hkchu/offload

bafaea1

lkl: Add offload (TSO4, CSUM) support to LKL device, #2 of 2

mpe pushed a commit to mpe/linux that referenced this pull request Apr 13, 2021

Merge pull request torvalds#184 from ojeda/improv

1fa10de

A few improvements

torvalds closed this Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixed spelling errors #184

fixed spelling errors #184

Uh oh!

dgucsekelly commented May 11, 2015

Uh oh!

ecnepsnai commented May 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fixed spelling errors #184

fixed spelling errors #184

Uh oh!

Conversation

dgucsekelly commented May 11, 2015

Uh oh!

ecnepsnai commented May 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants