Skip to content

Conversation

@redbrain17
Copy link

just merged longterm supported branch 4.4.39 into linux-fslc 4.4.x branch

ajdlinux and others added 18 commits December 15, 2016 08:49
commit 409bf7f upstream.

In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after
after we call eeh_reset_pe() to reset the PCI controller. We then call
eeh_clear_pe_frozen_state(), which can return an error. In this case, we
bail out of eeh_reset_device() without calling pci_unlock_rescan_remove().

Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state()
error path so that we don't cause a deadlock later on.

Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Fixes: 7895470 ("powerpc/eeh: Avoid I/O access during PE reset")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c78e710 upstream.

The attached change interchanges the order of purging the TLB and
setting the corresponding page table entry.  TLB purges are strongly
ordered.  It occurred to me one night that setting the PTE first might
have subtle ordering issues on SMP machines and cause random memory
corruption.

A TLB lock guards the insertion of user TLB entries.  So after the TLB
is purged, a new entry can't be inserted until the lock is released.
This ensures that the new PTE value is used when the lock is released.

Since making this change, no random segmentation faults have been
observed on the Debian hppa buildd servers.

Signed-off-by: John David Anglin  <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…flush_icache_page_asm

commit febe429 upstream.

We have four routines in pacache.S that use temporary alias pages:
copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and
flush_icache_page_asm().  copy_user_page_asm() and clear_user_page_asm()
don't purge the TLB entry used for the operation.
flush_dcache_page_asm() and flush_icache_page_asm do purge the entry.

Presumably, this was thought to optimize TLB use.  However, the
operation is quite heavy weight on PA 1.X processors as we need to take
the TLB lock and a TLB broadcast is sent to all processors.

This patch removes the purges from flush_dcache_page_asm() and
flush_icache_page_asm.

Signed-off-by: John David Anglin  <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24d0492 upstream.

At bootup we run measurements to calculate the best threshold for when we
should be using full TLB flushes instead of just flushing a specific amount of
TLB entries.  This performance test is run over the kernel text segment.

But running this TLB performance test on the kernel text segment turned out to
crash some SMP machines when the kernel text pages were mapped as huge pages.

To avoid those crashes this patch simply skips this test on some SMP machines
and calculates an optimal threshold based on the maximum number of available
TLB entries and number of online CPUs.

On a technical side, this seems to happen:
The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
instruction seems to work without problems even if the pages were mapped as
huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
other CPUs. Those CPUs then crash the machine because the page size is not as
expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
problem, because the required cache coherency prohibits to use huge pages at
all.  Sadly I didn't found any documentation about this behaviour, so this
finding is purely based on testing with phyiscal SMP machines (A500-44 and
J5000, both were 2-way boxes).

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c7e9cc upstream.

zram hot_add sysfs attribute is a very 'special' attribute - reading
from it creates a new uninitialized zram device.  This file, by a
mistake, can be read by a 'normal' user at the moment, while only root
must be able to create a new zram device, therefore hot_add attribute
must have S_IRUSR mode, not S_IRUGO.

[akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
Fixes: 6566d1a ("zram: add dynamic device add/remove functionality")
Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Allen <steven@stebalien.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbb2605 upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l->owner=T1
		rt_mutex_lock(l)
		lock(l->wait_lock)
		l->owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l->wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l->wait_lock)
				l->owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l->wait_lock)
				schedule()
		signal(->T2)	signal(->T3)
		lock(l->wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l->wait_lock)
				lock(l->wait_lock)
				dequeue(T3)
				  ===> wait list is now empty
				deboost()
				 unlock(l->wait_lock)
		lock(l->wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l->owner & ~HAS_WAITERS;
		    l->owner = owner
		     ==> l->owner = T1
		  }

				lock(l->wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
				    l->owner = owner
				     ==> l->owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 23f78d4 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1be5d4f upstream.

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f612a7 upstream.

Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM.

Both these parts have full_width_write set, and that does indeed have
a problem. In order to deal with counter wrap, we must sample the
counter at at least half the counter period (see also the sampling
theorem) such that we can unambiguously reconstruct the count.

However commit:

  069e0c3 ("perf/x86/intel: Support full width counting")

sets the sampling interval to the full period, not half.

Fixing that exposes another issue, in that we must not sign extend the
delta value when we shift it right; the counter cannot have
decremented after all.

With both these issues fixed, counter overflow functions correctly
again.

Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Tested-by: Liang, Kan <kan.liang@intel.com>
Tested-by: Odzioba, Lukasz <lukasz.odzioba@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 069e0c3 ("perf/x86/intel: Support full width counting")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48a9927 upstream.

Algorithms not compatible with mcryptd could be spawned by mcryptd
with a direct crypto_alloc_tfm invocation using a "mcryptd(alg)" name
construct.  This causes mcryptd to crash the kernel if an arbitrary
"alg" is incompatible and not intended to be used with mcryptd.  It is
an issue if AF_ALG tries to spawn mcryptd(alg) to expose it externally.
But such algorithms must be used internally and not be exposed.

We added a check to enforce that only internal algorithms are allowed
with mcryptd at the time mcryptd is spawning an algorithm.

Link: http://marc.info/?l=linux-crypto-vger&m=148063683310477&w=2
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 332b05c upstream.

This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
are not prevented resulting in a warning.

Reference: https://lkml.org/lkml/2016/12/2/230

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b67d0dd upstream.

Fix for bad memory access while disconnecting. netdev is freed before
private data free, and dev is accessed after freeing netdev.

This makes a slub problem, and it raise kernel oops with slub debugger
config.

Signed-off-by: Jiho Chu <jiho.chu@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 811d61e upstream.

futex.h's futex_atomic_cmpxchg_inatomic() does not use the
__futex_atomic_op() macro and needs its own PAN toggling. This was missed
when the feature was implemented.

Fixes: 338d4f4 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e251bb upstream.

The current ndelay() macro definition has an extra semi-colon at the
end of the line thus leading to a compilation error when ndelay is used
in a conditional block without curly braces like this one:

	if (cond)
		ndelay(t);
	else
		...

which, after the preprocessor pass gives:

	if (cond)
		m68k_ndelay(t);;
	else
		...

thus leading to the following gcc error:

	error: 'else' without a previous 'if'

Remove this extra semi-colon.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: c8ee038 ("m68k: Implement ndelay() based on the existing udelay() logic")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2d0f48 upstream.

batadv_tt_prepare_tvlv_local_data can fail to allocate the memory for the
new TVLV block. The caller is informed about this problem with the returned
length of 0. Not checking this value results in an invalid memory access
when either tt_data or tt_change is accessed.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 7ea7b4a ("batman-adv: make the TT CRC logic VLAN specific")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 777c6e0 upstream.

Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following

[  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[  145.122628] Call Trace:
[  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17

This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.

Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.

Fixes: 47e627b ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5789163 upstream.

Both asn1 headers are included by rsa_helper.c, so rsa_helper.o
should explicitly depend on them.

Signed-off-by: David Michael <david.michael@coreos.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
just merged longterm supported branch 4.4.39 into linux-fslc 4.4.x branch
@otavio otavio merged commit 420f83a into Freescale:4.4.x Dec 20, 2016
@otavio
Copy link
Member

otavio commented Dec 20, 2016

Please send a bump for the recipe as well; if you are in the mood please take a look also on 4.8 and 4.9 ;-)

@redbrain17
Copy link
Author

redbrain17 commented Dec 23, 2016 via email

otavio pushed a commit that referenced this pull request Mar 9, 2017
commit f222449 upstream.

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[   48.950592] ======================================================
[   48.950622] [ INFO: possible circular locking dependency detected ]
[   48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[   48.950622] -------------------------------------------------------
[   48.950622] kworker/0:0/3 is trying to acquire lock:
[   48.950653]  (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[   48.950683]
               but task is already holding lock:
[   48.950683]  (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[   48.950714]
               which lock already depends on the new lock.

[   48.950714]
               the existing dependency chain (in reverse order) is:
[   48.950714]
               -> #5 (hrtimer_bases.lock){-.-...}:
[   48.950744]        _raw_spin_lock_irqsave+0x50/0x64
[   48.950775]        lock_hrtimer_base+0x28/0x58
[   48.950775]        hrtimer_start_range_ns+0x20/0x5c8
[   48.950775]        __enqueue_rt_entity+0x320/0x360
[   48.950805]        enqueue_rt_entity+0x2c/0x44
[   48.950805]        enqueue_task_rt+0x24/0x94
[   48.950836]        ttwu_do_activate+0x54/0xc0
[   48.950836]        try_to_wake_up+0x248/0x5c8
[   48.950836]        __setup_irq+0x420/0x5f0
[   48.950836]        request_threaded_irq+0xdc/0x184
[   48.950866]        devm_request_threaded_irq+0x58/0xa4
[   48.950866]        omap_i2c_probe+0x530/0x6a0
[   48.950897]        platform_drv_probe+0x50/0xb0
[   48.950897]        driver_probe_device+0x1f8/0x2cc
[   48.950897]        __driver_attach+0xc0/0xc4
[   48.950927]        bus_for_each_dev+0x6c/0xa0
[   48.950927]        bus_add_driver+0x100/0x210
[   48.950927]        driver_register+0x78/0xf4
[   48.950958]        do_one_initcall+0x3c/0x16c
[   48.950958]        kernel_init_freeable+0x20c/0x2d8
[   48.950958]        kernel_init+0x8/0x110
[   48.950988]        ret_from_fork+0x14/0x24
[   48.950988]
               -> #4 (&rt_b->rt_runtime_lock){-.-...}:
[   48.951019]        _raw_spin_lock+0x40/0x50
[   48.951019]        rq_offline_rt+0x9c/0x2bc
[   48.951019]        set_rq_offline.part.2+0x2c/0x58
[   48.951049]        rq_attach_root+0x134/0x144
[   48.951049]        cpu_attach_domain+0x18c/0x6f4
[   48.951049]        build_sched_domains+0xba4/0xd80
[   48.951080]        sched_init_smp+0x68/0x10c
[   48.951080]        kernel_init_freeable+0x160/0x2d8
[   48.951080]        kernel_init+0x8/0x110
[   48.951080]        ret_from_fork+0x14/0x24
[   48.951110]
               -> #3 (&rq->lock){-.-.-.}:
[   48.951110]        _raw_spin_lock+0x40/0x50
[   48.951141]        task_fork_fair+0x30/0x124
[   48.951141]        sched_fork+0x194/0x2e0
[   48.951141]        copy_process.part.5+0x448/0x1a20
[   48.951171]        _do_fork+0x98/0x7e8
[   48.951171]        kernel_thread+0x2c/0x34
[   48.951171]        rest_init+0x1c/0x18c
[   48.951202]        start_kernel+0x35c/0x3d4
[   48.951202]        0x8000807c
[   48.951202]
               -> #2 (&p->pi_lock){-.-.-.}:
[   48.951232]        _raw_spin_lock_irqsave+0x50/0x64
[   48.951232]        try_to_wake_up+0x30/0x5c8
[   48.951232]        up+0x4c/0x60
[   48.951263]        __up_console_sem+0x2c/0x58
[   48.951263]        console_unlock+0x3b4/0x650
[   48.951263]        vprintk_emit+0x270/0x474
[   48.951293]        vprintk_default+0x20/0x28
[   48.951293]        printk+0x20/0x30
[   48.951324]        kauditd_hold_skb+0x94/0xb8
[   48.951324]        kauditd_thread+0x1a4/0x56c
[   48.951324]        kthread+0x104/0x148
[   48.951354]        ret_from_fork+0x14/0x24
[   48.951354]
               -> #1 ((console_sem).lock){-.....}:
[   48.951385]        _raw_spin_lock_irqsave+0x50/0x64
[   48.951385]        down_trylock+0xc/0x2c
[   48.951385]        __down_trylock_console_sem+0x24/0x80
[   48.951385]        console_trylock+0x10/0x8c
[   48.951416]        vprintk_emit+0x264/0x474
[   48.951416]        vprintk_default+0x20/0x28
[   48.951416]        printk+0x20/0x30
[   48.951446]        tk_debug_account_sleep_time+0x5c/0x70
[   48.951446]        __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[   48.951446]        timekeeping_resume+0x218/0x23c
[   48.951477]        syscore_resume+0x94/0x42c
[   48.951477]        suspend_enter+0x554/0x9b4
[   48.951477]        suspend_devices_and_enter+0xd8/0x4b4
[   48.951507]        enter_state+0x934/0xbd4
[   48.951507]        pm_suspend+0x14/0x70
[   48.951507]        state_store+0x68/0xc8
[   48.951538]        kernfs_fop_write+0xf4/0x1f8
[   48.951538]        __vfs_write+0x1c/0x114
[   48.951538]        vfs_write+0xa0/0x168
[   48.951568]        SyS_write+0x3c/0x90
[   48.951568]        __sys_trace_return+0x0/0x10
[   48.951568]
               -> #0 (tk_core){----..}:
[   48.951599]        lock_acquire+0xe0/0x294
[   48.951599]        ktime_get_update_offsets_now+0x5c/0x1d4
[   48.951629]        retrigger_next_event+0x4c/0x90
[   48.951629]        on_each_cpu+0x40/0x7c
[   48.951629]        clock_was_set_work+0x14/0x20
[   48.951660]        process_one_work+0x2b4/0x808
[   48.951660]        worker_thread+0x3c/0x550
[   48.951660]        kthread+0x104/0x148
[   48.951690]        ret_from_fork+0x14/0x24
[   48.951690]
               other info that might help us debug this:

[   48.951690] Chain exists of:
                 tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[   48.951721]  Possible unsafe locking scenario:

[   48.951721]        CPU0                    CPU1
[   48.951721]        ----                    ----
[   48.951721]   lock(hrtimer_bases.lock);
[   48.951751]                                lock(&rt_b->rt_runtime_lock);
[   48.951751]                                lock(hrtimer_bases.lock);
[   48.951751]   lock(tk_core);
[   48.951782]
                *** DEADLOCK ***

[   48.951782] 3 locks held by kworker/0:0/3:
[   48.951782]  #0:  ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[   48.951812]  #1:  (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[   48.951843]  #2:  (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[   48.951843]   stack backtrace:
[   48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[   48.951904] Workqueue: events clock_was_set_work
[   48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[   48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[   48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[   48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[   48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[   48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[   48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[   48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[   48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[   48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[   48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[   48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[   48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[   48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f1 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
otavio pushed a commit that referenced this pull request Jun 13, 2017
mipsxx_pmu_handle_shared_irq() calls irq_work_run() while holding the
pmuint_rwlock for read.  irq_work_run() can, via perf_pending_event(),
call try_to_wake_up() which can try to take rq->lock.

However, perf can also call perf_pmu_enable() (and thus take the
pmuint_rwlock for write) while holding the rq->lock, from
finish_task_switch() via perf_event_context_sched_in().

This leads to an ABBA deadlock:

 PID: 3855   TASK: 8f7ce288  CPU: 2   COMMAND: "process"
  #0 [89c39ac8] __delay at 803b5be4
  #1 [89c39ac8] do_raw_spin_lock at 8008fdcc
  #2 [89c39af8] try_to_wake_up at 8006e47c
  #3 [89c39b38] pollwake at 8018eab0
  #4 [89c39b68] __wake_up_common at 800879f4
  #5 [89c39b98] __wake_up at 800880e4
  #6 [89c39bc8] perf_event_wakeup at 8012109c
  #7 [89c39be8] perf_pending_event at 80121184
  #8 [89c39c08] irq_work_run_list at 801151f0
  #9 [89c39c38] irq_work_run at 80115274
 #10 [89c39c50] mipsxx_pmu_handle_shared_irq at 8002cc7c

 PID: 1481   TASK: 8eaac6a8  CPU: 3   COMMAND: "process"
  #0 [8de7f900] do_raw_write_lock at 800900e0
  #1 [8de7f918] perf_event_context_sched_in at 80122310
  #2 [8de7f938] __perf_event_task_sched_in at 80122608
  #3 [8de7f958] finish_task_switch at 8006b8a4
  #4 [8de7f998] __schedule at 805e4dc4
  #5 [8de7f9f8] schedule at 805e5558
  #6 [8de7fa10] schedule_hrtimeout_range_clock at 805e9984
  #7 [8de7fa70] poll_schedule_timeout at 8018e8f8
  #8 [8de7fa88] do_select at 8018f338
  #9 [8de7fd88] core_sys_select at 8018f5cc
 #10 [8de7fee0] sys_select at 8018f854
 #11 [8de7ff28] syscall_common at 80028fc8

The lock seems to be there to protect the hardware counters so there is
no need to hold it across irq_work_run().

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
otavio pushed a commit that referenced this pull request Jun 13, 2017
Run this:

    touch file0
    for ((; ;))
    {
        mount -t cpuset xxx file0
    }

And this concurrently:

    touch file1
    for ((; ;))
    {
        mount -t cpuset xxx file1
    }

We'll trigger a warning like this:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
 percpu_ref_kill_and_confirm called more than once on css_release!
 CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 Call Trace:
  dump_stack+0x63/0x84
  __warn+0xd1/0xf0
  warn_slowpath_fmt+0x5f/0x80
  percpu_ref_kill_and_confirm+0x92/0xb0
  cgroup_kill_sb+0x95/0xb0
  deactivate_locked_super+0x43/0x70
  deactivate_super+0x46/0x60
 ...
 ---[ end trace a79f61c2a2633700 ]---

Here's a race:

  Thread A				Thread B

  cgroup1_mount()
    # alloc a new cgroup root
    cgroup_setup_root()
					cgroup1_mount()
					  # no sb yet, returns NULL
					  kernfs_pin_sb()

					  # but succeeds in getting the refcnt,
					  # so re-use cgroup root
					  percpu_ref_tryget_live()
    # alloc sb with cgroup root
    cgroup_do_mount()

  cgroup_kill_sb()
					  # alloc another sb with same root
					  cgroup_do_mount()

					cgroup_kill_sb()

We end up using the same cgroup root for two different superblocks,
so percpu_ref_kill() will be called twice on the same root when the
two superblocks are destroyed.

We should fix to make sure the superblock pinning is really successful.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
otavio pushed a commit that referenced this pull request Aug 7, 2017
commit cdea465 upstream.

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This was first introduced in 7ea0ed2 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
gibsson pushed a commit to gibsson/linux-fslc that referenced this pull request Aug 14, 2017
commit f50b878 upstream.

A NULL-pointer dereference bug in gadgetfs was uncovered by syzkaller:

> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [Freescale#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 4820 Comm: syz-executor0 Not tainted 4.12.0-rc4+ Freescale#5
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff880039542dc0 task.stack: ffff88003bdd0000
> RIP: 0010:__list_del_entry_valid+0x7e/0x170 lib/list_debug.c:51
> RSP: 0018:ffff88003bdd6e50 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000010000
> RDX: 0000000000000000 RSI: ffffffff86504948 RDI: ffffffff86504950
> RBP: ffff88003bdd6e68 R08: ffff880039542dc0 R09: ffffffff8778ce00
> R10: ffff88003bdd6e68 R11: dffffc0000000000 R12: 0000000000000000
> R13: dffffc0000000000 R14: 1ffff100077badd2 R15: ffffffff864d2e40
> FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000002014aff9 CR3: 0000000006022000 CR4: 00000000000006e0
> Call Trace:
>  __list_del_entry include/linux/list.h:116 [inline]
>  list_del include/linux/list.h:124 [inline]
>  usb_gadget_unregister_driver+0x166/0x4c0 drivers/usb/gadget/udc/core.c:1387
>  dev_release+0x80/0x160 drivers/usb/gadget/legacy/inode.c:1187
>  __fput+0x332/0x7f0 fs/file_table.c:209
>  ____fput+0x15/0x20 fs/file_table.c:245
>  task_work_run+0x19b/0x270 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0x18a3/0x2820 kernel/exit.c:878
>  do_group_exit+0x149/0x420 kernel/exit.c:982
>  get_signal+0x77f/0x1780 kernel/signal.c:2318
>  do_signal+0xd2/0x2130 arch/x86/kernel/signal.c:808
>  exit_to_usermode_loop+0x1a7/0x240 arch/x86/entry/common.c:157
>  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>  syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263
>  entry_SYSCALL_64_fastpath+0xbc/0xbe
> RIP: 0033:0x4461f9
> RSP: 002b:00007fdac2b1ecf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00000000007080c8 RCX: 00000000004461f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000007080c8
> RBP: 00000000007080a8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007fdac2b1f9c0 R15: 00007fdac2b1f700
> Code: 00 00 00 00 ad de 49 39 c4 74 6a 48 b8 00 02 00 00 00 00 ad de
> 48 89 da 48 39 c3 74 74 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df <80>
> 3c 02 00 0f 85 92 00 00 00 48 8b 13 48 39 f2 75 66 49 8d 7c
> RIP: __list_del_entry_valid+0x7e/0x170 lib/list_debug.c:51 RSP: ffff88003bdd6e50
> ---[ end trace 30e94b1eec4831c8 ]---
> Kernel panic - not syncing: Fatal exception

The bug was caused by dev_release() failing to turn off its
gadget_registered flag after unregistering the gadget driver.  As a
result, when a later user closed the device file before writing a
valid set of descriptors, dev_release() thought the gadget had been
registered and tried to unregister it, even though it had not been.
This led to the NULL pointer dereference.

The fix is simple: turn off the flag when the gadget is unregistered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gibsson pushed a commit to gibsson/linux-fslc that referenced this pull request Aug 14, 2017
commit cdea465 upstream.

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 Freescale#1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 Freescale#2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 Freescale#3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 Freescale#4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 Freescale#5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 Freescale#6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 Freescale#7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 Freescale#8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 Freescale#9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This was first introduced in 7ea0ed2 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
gibsson pushed a commit to gibsson/linux-fslc that referenced this pull request Aug 14, 2017
commit b2504a5 upstream.

Dmitry reported warnings occurring in __skb_gso_segment() [1]

All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.

We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.

It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.

By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.

With help from Willem de Bruijn

[1]
WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 Freescale#5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
 ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
Call Trace:
 [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
 [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
 [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
 [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
 [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
otavio pushed a commit that referenced this pull request Sep 22, 2017
[ Upstream commit cdea465 ]

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Cc: stable@vger.kernel.org # 3.19

This was first introduced in 7ea0ed2 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
otavio pushed a commit that referenced this pull request Sep 22, 2017
[ Upstream commit b2504a5 ]

Dmitry reported warnings occurring in __skb_gso_segment() [1]

All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.

We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.

It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.

By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.

With help from Willem de Bruijn

[1]
WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
 ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
Call Trace:
 [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
 [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
 [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
 [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
 [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
otavio pushed a commit that referenced this pull request May 30, 2025
[ Upstream commit 27b9180 ]

With the device instance lock, there is now a possibility of a deadlock:

[    1.211455] ============================================
[    1.211571] WARNING: possible recursive locking detected
[    1.211687] 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 Not tainted
[    1.211823] --------------------------------------------
[    1.211936] ip/184 is trying to acquire lock:
[    1.212032] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_set_allmulti+0x4e/0xb0
[    1.212207]
[    1.212207] but task is already holding lock:
[    1.212332] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0
[    1.212487]
[    1.212487] other info that might help us debug this:
[    1.212626]  Possible unsafe locking scenario:
[    1.212626]
[    1.212751]        CPU0
[    1.212815]        ----
[    1.212871]   lock(&dev->lock);
[    1.212944]   lock(&dev->lock);
[    1.213016]
[    1.213016]  *** DEADLOCK ***
[    1.213016]
[    1.213143]  May be due to missing lock nesting notation
[    1.213143]
[    1.213294] 3 locks held by ip/184:
[    1.213371]  #0: ffffffff838b53e0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x1b/0xa0
[    1.213543]  #1: ffffffff84e5fc70 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x37/0xa0
[    1.213727]  #2: ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0
[    1.213895]
[    1.213895] stack backtrace:
[    1.213991] CPU: 0 UID: 0 PID: 184 Comm: ip Not tainted 6.14.0-rc5-01215-g032756b4ca7a-dirty #5
[    1.213993] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[    1.213994] Call Trace:
[    1.213995]  <TASK>
[    1.213996]  dump_stack_lvl+0x8e/0xd0
[    1.214000]  print_deadlock_bug+0x28b/0x2a0
[    1.214020]  lock_acquire+0xea/0x2a0
[    1.214027]  __mutex_lock+0xbf/0xd40
[    1.214038]  dev_set_allmulti+0x4e/0xb0 # real_dev->flags & IFF_ALLMULTI
[    1.214040]  vlan_dev_open+0xa5/0x170 # ndo_open on vlandev
[    1.214042]  __dev_open+0x145/0x270
[    1.214046]  __dev_change_flags+0xb0/0x1e0
[    1.214051]  netif_change_flags+0x22/0x60 # IFF_UP vlandev
[    1.214053]  dev_change_flags+0x61/0xb0 # for each device in group from dev->vlan_info
[    1.214055]  vlan_device_event+0x766/0x7c0 # on netdevsim0
[    1.214058]  notifier_call_chain+0x78/0x120
[    1.214062]  netif_open+0x6d/0x90
[    1.214064]  dev_open+0x5b/0xb0 # locks netdevsim0
[    1.214066]  bond_enslave+0x64c/0x1230
[    1.214075]  do_set_master+0x175/0x1e0 # on netdevsim0
[    1.214077]  do_setlink+0x516/0x13b0
[    1.214094]  rtnl_newlink+0xaba/0xb80
[    1.214132]  rtnetlink_rcv_msg+0x440/0x490
[    1.214144]  netlink_rcv_skb+0xeb/0x120
[    1.214150]  netlink_unicast+0x1f9/0x320
[    1.214153]  netlink_sendmsg+0x346/0x3f0
[    1.214157]  __sock_sendmsg+0x86/0xb0
[    1.214160]  ____sys_sendmsg+0x1c8/0x220
[    1.214164]  ___sys_sendmsg+0x28f/0x2d0
[    1.214179]  __x64_sys_sendmsg+0xef/0x140
[    1.214184]  do_syscall_64+0xec/0x1d0
[    1.214190]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    1.214191] RIP: 0033:0x7f2d1b4a7e56

Device setup:

     netdevsim0 (down)
     ^        ^
  bond        netdevsim1.100@netdevsim1 allmulticast=on (down)

When we enslave the lower device (netdevsim0) which has a vlan, we
propagate vlan's allmuti/promisc flags during ndo_open. This causes
(re)locking on of the real_dev.

Propagate allmulti/promisc on flags change, not on the open. There
is a slight semantics change that vlans that are down now propagate
the flags, but this seems unlikely to result in the real issues.

Reproducer:

  echo 0 1 > /sys/bus/netdevsim/new_device

  dev_path=$(ls -d /sys/bus/netdevsim/devices/netdevsim0/net/*)
  dev=$(echo $dev_path | rev | cut -d/ -f1 | rev)

  ip link set dev $dev name netdevsim0
  ip link set dev netdevsim0 up

  ip link add link netdevsim0 name netdevsim0.100 type vlan id 100
  ip link set dev netdevsim0.100 allmulticast on down
  ip link add name bond1 type bond mode 802.3ad
  ip link set dev netdevsim0 down
  ip link set dev netdevsim0 master bond1
  ip link set dev bond1 up
  ip link show

Reported-by: syzbot+b0c03d76056ef6cd12a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/Z9CfXjLMKn6VLG5d@mini-arch/T/#m15ba130f53227c883e79fb969687d69d670337a0
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313100657.2287455-1-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
paletteguy referenced this pull request in Laerdal/linux-fslc Jun 6, 2025
[ Upstream commit f6205f8 ]

The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].

Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().

[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit

write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
 vxlan_xmit+0xb29/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
 vxlan_xmit+0xadf/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f

Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

[2]
 #!/bin/bash

 set +H
 echo whitelist > /sys/kernel/debug/kcsan
 echo !vxlan_xmit > /sys/kernel/debug/kcsan

 ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
 bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
 taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
 taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 20, 2025
[ Upstream commit 88f7f56 ]

When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Tianxiang Peng <txpeng@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 20, 2025
[ Upstream commit f6205f8 ]

The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].

Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().

[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit

write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
 vxlan_xmit+0xb29/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
 vxlan_xmit+0xadf/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f

Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

[2]
 #!/bin/bash

 set +H
 echo whitelist > /sys/kernel/debug/kcsan
 echo !vxlan_xmit > /sys/kernel/debug/kcsan

 ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
 bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
 taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
 taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
eghidoli pushed a commit to eghidoli/linux-fslc that referenced this pull request Jun 23, 2025
[ Upstream commit ee684de ]

As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.

Consider the situation below where:
- prog_start = sec_start + symbol_offset    <-- size_t overflow here
- prog_end   = prog_start + prog_size

    prog_start        sec_start        prog_end        sec_end
        |                |                 |              |
        v                v                 v              v
    .....................|################################|............

The report in [1] also provides a corrupted BPF ELF which can be used as
a reproducer:

    $ readelf -S crash
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
    ...
      [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
           0000000000000068  0000000000000000  AX       0     0     8

    $ readelf -s crash
    Symbol table '.symtab' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
         6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp

Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.

This is also reported by AddressSanitizer:

    =================================================================
    ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
    READ of size 104 at 0x7c7302fe0000 thread T0
        #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
        Freescale#1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
        Freescale#2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
        Freescale#3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
        Freescale#4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
        Freescale#5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
        Freescale#6 0x000000400c16 in main /poc/poc.c:8
        Freescale#7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
        Freescale#8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
        Freescale#9 0x000000400b34 in _start (/poc/poc+0x400b34)

    0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
    allocated by thread T0 here:
        #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
        Freescale#1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
        Freescale#2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
        Freescale#3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740

The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947 ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").

Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.

[1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md

Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Reported-by: lmarch2 <2524158037@qq.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 24, 2025
[ Upstream commit 88f7f56 ]

When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Tianxiang Peng <txpeng@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 24, 2025
[ Upstream commit f6205f8 ]

The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].

Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().

[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit

write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
 vxlan_xmit+0xb29/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
 vxlan_xmit+0xadf/0x2380
 dev_hard_start_xmit+0x84/0x2f0
 __dev_queue_xmit+0x45a/0x1650
 packet_xmit+0x100/0x150
 packet_sendmsg+0x2114/0x2ac0
 __sys_sendto+0x318/0x330
 __x64_sys_sendto+0x76/0x90
 x64_sys_call+0x14e8/0x1c00
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f

Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

[2]
 #!/bin/bash

 set +H
 echo whitelist > /sys/kernel/debug/kcsan
 echo !vxlan_xmit > /sys/kernel/debug/kcsan

 ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
 bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
 taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
 taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 24, 2025
[ Upstream commit ee684de ]

As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.

Consider the situation below where:
- prog_start = sec_start + symbol_offset    <-- size_t overflow here
- prog_end   = prog_start + prog_size

    prog_start        sec_start        prog_end        sec_end
        |                |                 |              |
        v                v                 v              v
    .....................|################################|............

The report in [1] also provides a corrupted BPF ELF which can be used as
a reproducer:

    $ readelf -S crash
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
    ...
      [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
           0000000000000068  0000000000000000  AX       0     0     8

    $ readelf -s crash
    Symbol table '.symtab' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
         6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp

Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.

This is also reported by AddressSanitizer:

    =================================================================
    ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
    READ of size 104 at 0x7c7302fe0000 thread T0
        #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
        #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
        #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
        #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
        #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
        #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
        #6 0x000000400c16 in main /poc/poc.c:8
        #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
        #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
        #9 0x000000400b34 in _start (/poc/poc+0x400b34)

    0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
    allocated by thread T0 here:
        #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
        #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
        #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
        #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740

The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947 ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").

Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.

[1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md

Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Reported-by: lmarch2 <2524158037@qq.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Jun 24, 2025
commit c98cc97 upstream.

Running a modified trace-cmd record --nosplice where it does a mmap of the
ring buffer when '--nosplice' is set, caused the following lockdep splat:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.15.0-rc7-test-00002-gfb7d03d8a82f #551 Not tainted
 ------------------------------------------------------
 trace-cmd/1113 is trying to acquire lock:
 ffff888100062888 (&buffer->mutex){+.+.}-{4:4}, at: ring_buffer_map+0x11c/0xe70

 but task is already holding lock:
 ffff888100a5f9f8 (&cpu_buffer->mapping_lock){+.+.}-{4:4}, at: ring_buffer_map+0xcf/0xe70

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #5 (&cpu_buffer->mapping_lock){+.+.}-{4:4}:
        __mutex_lock+0x192/0x18c0
        ring_buffer_map+0xcf/0xe70
        tracing_buffers_mmap+0x1c4/0x3b0
        __mmap_region+0xd8d/0x1f70
        do_mmap+0x9d7/0x1010
        vm_mmap_pgoff+0x20b/0x390
        ksys_mmap_pgoff+0x2e9/0x440
        do_syscall_64+0x79/0x1c0
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #4 (&mm->mmap_lock){++++}-{4:4}:
        __might_fault+0xa5/0x110
        _copy_to_user+0x22/0x80
        _perf_ioctl+0x61b/0x1b70
        perf_ioctl+0x62/0x90
        __x64_sys_ioctl+0x134/0x190
        do_syscall_64+0x79/0x1c0
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #3 (&cpuctx_mutex){+.+.}-{4:4}:
        __mutex_lock+0x192/0x18c0
        perf_event_init_cpu+0x325/0x7c0
        perf_event_init+0x52a/0x5b0
        start_kernel+0x263/0x3e0
        x86_64_start_reservations+0x24/0x30
        x86_64_start_kernel+0x95/0xa0
        common_startup_64+0x13e/0x141

 -> #2 (pmus_lock){+.+.}-{4:4}:
        __mutex_lock+0x192/0x18c0
        perf_event_init_cpu+0xb7/0x7c0
        cpuhp_invoke_callback+0x2c0/0x1030
        __cpuhp_invoke_callback_range+0xbf/0x1f0
        _cpu_up+0x2e7/0x690
        cpu_up+0x117/0x170
        cpuhp_bringup_mask+0xd5/0x120
        bringup_nonboot_cpus+0x13d/0x170
        smp_init+0x2b/0xf0
        kernel_init_freeable+0x441/0x6d0
        kernel_init+0x1e/0x160
        ret_from_fork+0x34/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #1 (cpu_hotplug_lock){++++}-{0:0}:
        cpus_read_lock+0x2a/0xd0
        ring_buffer_resize+0x610/0x14e0
        __tracing_resize_ring_buffer.part.0+0x42/0x120
        tracing_set_tracer+0x7bd/0xa80
        tracing_set_trace_write+0x132/0x1e0
        vfs_write+0x21c/0xe80
        ksys_write+0xf9/0x1c0
        do_syscall_64+0x79/0x1c0
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #0 (&buffer->mutex){+.+.}-{4:4}:
        __lock_acquire+0x1405/0x2210
        lock_acquire+0x174/0x310
        __mutex_lock+0x192/0x18c0
        ring_buffer_map+0x11c/0xe70
        tracing_buffers_mmap+0x1c4/0x3b0
        __mmap_region+0xd8d/0x1f70
        do_mmap+0x9d7/0x1010
        vm_mmap_pgoff+0x20b/0x390
        ksys_mmap_pgoff+0x2e9/0x440
        do_syscall_64+0x79/0x1c0
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 other info that might help us debug this:

 Chain exists of:
   &buffer->mutex --> &mm->mmap_lock --> &cpu_buffer->mapping_lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&cpu_buffer->mapping_lock);
                                lock(&mm->mmap_lock);
                                lock(&cpu_buffer->mapping_lock);
   lock(&buffer->mutex);

  *** DEADLOCK ***

 2 locks held by trace-cmd/1113:
  #0: ffff888106b847e0 (&mm->mmap_lock){++++}-{4:4}, at: vm_mmap_pgoff+0x192/0x390
  #1: ffff888100a5f9f8 (&cpu_buffer->mapping_lock){+.+.}-{4:4}, at: ring_buffer_map+0xcf/0xe70

 stack backtrace:
 CPU: 5 UID: 0 PID: 1113 Comm: trace-cmd Not tainted 6.15.0-rc7-test-00002-gfb7d03d8a82f #551 PREEMPT
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x6e/0xa0
  print_circular_bug.cold+0x178/0x1be
  check_noncircular+0x146/0x160
  __lock_acquire+0x1405/0x2210
  lock_acquire+0x174/0x310
  ? ring_buffer_map+0x11c/0xe70
  ? ring_buffer_map+0x11c/0xe70
  ? __mutex_lock+0x169/0x18c0
  __mutex_lock+0x192/0x18c0
  ? ring_buffer_map+0x11c/0xe70
  ? ring_buffer_map+0x11c/0xe70
  ? function_trace_call+0x296/0x370
  ? __pfx___mutex_lock+0x10/0x10
  ? __pfx_function_trace_call+0x10/0x10
  ? __pfx___mutex_lock+0x10/0x10
  ? _raw_spin_unlock+0x2d/0x50
  ? ring_buffer_map+0x11c/0xe70
  ? ring_buffer_map+0x11c/0xe70
  ? __mutex_lock+0x5/0x18c0
  ring_buffer_map+0x11c/0xe70
  ? do_raw_spin_lock+0x12d/0x270
  ? find_held_lock+0x2b/0x80
  ? _raw_spin_unlock+0x2d/0x50
  ? rcu_is_watching+0x15/0xb0
  ? _raw_spin_unlock+0x2d/0x50
  ? trace_preempt_on+0xd0/0x110
  tracing_buffers_mmap+0x1c4/0x3b0
  __mmap_region+0xd8d/0x1f70
  ? ring_buffer_lock_reserve+0x99/0xff0
  ? __pfx___mmap_region+0x10/0x10
  ? ring_buffer_lock_reserve+0x99/0xff0
  ? __pfx_ring_buffer_lock_reserve+0x10/0x10
  ? __pfx_ring_buffer_lock_reserve+0x10/0x10
  ? bpf_lsm_mmap_addr+0x4/0x10
  ? security_mmap_addr+0x46/0xd0
  ? lock_is_held_type+0xd9/0x130
  do_mmap+0x9d7/0x1010
  ? 0xffffffffc0370095
  ? __pfx_do_mmap+0x10/0x10
  vm_mmap_pgoff+0x20b/0x390
  ? __pfx_vm_mmap_pgoff+0x10/0x10
  ? 0xffffffffc0370095
  ksys_mmap_pgoff+0x2e9/0x440
  do_syscall_64+0x79/0x1c0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fb0963a7de2
 Code: 00 00 00 0f 1f 44 00 00 41 f7 c1 ff 0f 00 00 75 27 55 89 cd 53 48 89 fb 48 85 ff 74 3b 41 89 ea 48 89 df b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 5b 5d c3 0f 1f 00 48 8b 05 e1 9f 0d 00 64
 RSP: 002b:00007ffdcc8fb878 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb0963a7de2
 RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
 RBP: 0000000000000001 R08: 0000000000000006 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007ffdcc8fbe68 R14: 00007fb096628000 R15: 00005633e01a5c90
  </TASK>

The issue is that cpus_read_lock() is taken within buffer->mutex. The
memory mapped pages are taken with the mmap_lock held. The buffer->mutex
is taken within the cpu_buffer->mapping_lock. There's quite a chain with
all these locks, where the deadlock can be fixed by moving the
cpus_read_lock() outside the taking of the buffer->mutex.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20250527105820.0f45d045@gandalf.local.home
Fixes: 117c392 ("ring-buffer: Introducing ring-buffer mapping functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
otavio pushed a commit that referenced this pull request Aug 8, 2025
[ Upstream commit 3bbe467 ]

When reconnecting a channel in smb2_reconnect_server(), a dummy tcon
is passed down to smb2_reconnect() with ->query_interface
uninitialized, so we can't call queue_delayed_work() on it.

Fix the following warning by ensuring that we're queueing the delayed
worker from correct tcon.

WARNING: CPU: 4 PID: 1126 at kernel/workqueue.c:2498 __queue_delayed_work+0x1d2/0x200
Modules linked in: cifs cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CPU: 4 UID: 0 PID: 1126 Comm: kworker/4:0 Not tainted 6.16.0-rc3 #5 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-4.fc42 04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__queue_delayed_work+0x1d2/0x200
Code: 41 5e 41 5f e9 7f ee ff ff 90 0f 0b 90 e9 5d ff ff ff bf 02 00
00 00 e8 6c f3 07 00 89 c3 eb bd 90 0f 0b 90 e9 57 f> 0b 90 e9 65 fe
ff ff 90 0f 0b 90 e9 72 fe ff ff 90 0f 0b 90 e9
RSP: 0018:ffffc900014afad8 EFLAGS: 00010003
RAX: 0000000000000000 RBX: ffff888124d99988 RCX: ffffffff81399cc1
RDX: dffffc0000000000 RSI: ffff888114326e00 RDI: ffff888124d999f0
RBP: 000000000000ea60 R08: 0000000000000001 R09: ffffed10249b3331
R10: ffff888124d9998f R11: 0000000000000004 R12: 0000000000000040
R13: ffff888114326e00 R14: ffff888124d999d8 R15: ffff888114939020
FS:  0000000000000000(0000) GS:ffff88829f7fe000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe7a2b4038 CR3: 0000000120a6f000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 queue_delayed_work_on+0xb4/0xc0
 smb2_reconnect+0xb22/0xf50 [cifs]
 smb2_reconnect_server+0x413/0xd40 [cifs]
 ? __pfx_smb2_reconnect_server+0x10/0x10 [cifs]
 ? local_clock_noinstr+0xd/0xd0
 ? local_clock+0x15/0x30
 ? lock_release+0x29b/0x390
 process_one_work+0x4c5/0xa10
 ? __pfx_process_one_work+0x10/0x10
 ? __list_add_valid_or_report+0x37/0x120
 worker_thread+0x2f1/0x5a0
 ? __kthread_parkme+0xde/0x100
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x1fe/0x380
 ? kthread+0x10f/0x380
 ? __pfx_kthread+0x10/0x10
 ? local_clock_noinstr+0xd/0xd0
 ? ret_from_fork+0x1b/0x1f0
 ? local_clock+0x15/0x30
 ? lock_release+0x29b/0x390
 ? rcu_is_watching+0x20/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x15b/0x1f0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
irq event stamp: 1116206
hardirqs last  enabled at (1116205): [<ffffffff8143af42>] __up_console_sem+0x52/0x60
hardirqs last disabled at (1116206): [<ffffffff81399f0e>] queue_delayed_work_on+0x6e/0xc0
softirqs last  enabled at (1116138): [<ffffffffc04562fd>] __smb_send_rqst+0x42d/0x950 [cifs]
softirqs last disabled at (1116136): [<ffffffff823d35e1>] release_sock+0x21/0xf0

Cc: linux-cifs@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Fixes: 42ca547 ("cifs: do not disable interface polling on failure")
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Aug 8, 2025
…terface

[ Upstream commit a90b2a1 ]

collect_md property on xfrm interfaces can only be set on device creation,
thus xfrmi_changelink() should fail when called on such interfaces.

The check to enforce this was done only in the case where the xi was
returned from xfrmi_locate() which doesn't look for the collect_md
interface, and thus the validation was never reached.

Calling changelink would thus errornously place the special interface xi
in the xfrmi_net->xfrmi hash, but since it also exists in the
xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when
the net namespace was taken down [1].

Change the check to use the xi from netdev_priv which is available earlier
in the function to prevent changes in xfrm collect_md interfaces.

[1] resulting oops:
[    8.516540] kernel BUG at net/core/dev.c:12029!
[    8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[    8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary)
[    8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    8.516569] Workqueue: netns cleanup_net
[    8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0
[    8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24
[    8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206
[    8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60
[    8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122
[    8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100
[    8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00
[    8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00
[    8.516615] FS:  0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000
[    8.516619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0
[    8.516625] PKRU: 55555554
[    8.516627] Call Trace:
[    8.516632]  <TASK>
[    8.516635]  ? rtnl_is_locked+0x15/0x20
[    8.516641]  ? unregister_netdevice_queue+0x29/0xf0
[    8.516650]  ops_undo_list+0x1f2/0x220
[    8.516659]  cleanup_net+0x1ad/0x2e0
[    8.516664]  process_one_work+0x160/0x380
[    8.516673]  worker_thread+0x2aa/0x3c0
[    8.516679]  ? __pfx_worker_thread+0x10/0x10
[    8.516686]  kthread+0xfb/0x200
[    8.516690]  ? __pfx_kthread+0x10/0x10
[    8.516693]  ? __pfx_kthread+0x10/0x10
[    8.516697]  ret_from_fork+0x82/0xf0
[    8.516705]  ? __pfx_kthread+0x10/0x10
[    8.516709]  ret_from_fork_asm+0x1a/0x30
[    8.516718]  </TASK>

Fixes: abc340b ("xfrm: interface: support collect metadata mode")
Reported-by: Lonial Con <kongln9170@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ivitro pushed a commit to ivitro/linux-fslc that referenced this pull request Sep 2, 2025
[ Upstream commit ee684de ]

As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.

Consider the situation below where:
- prog_start = sec_start + symbol_offset    <-- size_t overflow here
- prog_end   = prog_start + prog_size

    prog_start        sec_start        prog_end        sec_end
        |                |                 |              |
        v                v                 v              v
    .....................|################################|............

The report in [1] also provides a corrupted BPF ELF which can be used as
a reproducer:

    $ readelf -S crash
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
    ...
      [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
           0000000000000068  0000000000000000  AX       0     0     8

    $ readelf -s crash
    Symbol table '.symtab' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
         6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp

Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.

This is also reported by AddressSanitizer:

    =================================================================
    ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
    READ of size 104 at 0x7c7302fe0000 thread T0
        #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
        Freescale#1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
        Freescale#2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
        Freescale#3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
        Freescale#4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
        Freescale#5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
        Freescale#6 0x000000400c16 in main /poc/poc.c:8
        Freescale#7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
        Freescale#8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
        Freescale#9 0x000000400b34 in _start (/poc/poc+0x400b34)

    0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
    allocated by thread T0 here:
        #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
        Freescale#1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
        Freescale#2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
        Freescale#3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740

The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947 ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").

Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.

[1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md

Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Reported-by: lmarch2 <2524158037@qq.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Oct 21, 2025
[ Upstream commit 48918ca ]

The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    Freescale#1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    Freescale#2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    Freescale#3 0x7f29ce849cc2 in raise raise.c:27
    Freescale#4 0x7f29ce8324ac in abort abort.c:81
    Freescale#5 0x565358f662d4 in check_leaks builtin-test.c:226
    Freescale#6 0x565358f6682e in run_test_child builtin-test.c:344
    Freescale#7 0x565358ef7121 in start_command run-command.c:128
    Freescale#8 0x565358f67273 in start_test builtin-test.c:545
    Freescale#9 0x565358f6771d in __cmd_test builtin-test.c:647
    Freescale#10 0x565358f682bd in cmd_test builtin-test.c:849
    Freescale#11 0x565358ee5ded in run_builtin perf.c:349
    Freescale#12 0x565358ee6085 in handle_internal_command perf.c:401
    Freescale#13 0x565358ee61de in run_argv perf.c:448
    Freescale#14 0x565358ee6527 in main perf.c:555
    Freescale#15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    Freescale#16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    Freescale#17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Oct 21, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  Freescale#1 [3800313fb88] pci_remove_bus_device at c915105ba
  Freescale#2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  Freescale#3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  Freescale#4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  Freescale#5 [3800313fca0] __zpci_event_availability at c90fb3dca
  Freescale#6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  Freescale#7 [3800313fd60] crw_collect_info at c91905822
  Freescale#8 [3800313fe10] kthread at c90feb390
  Freescale#9 [3800313fe68] __ret_from_fork at c90f6aa64
  Freescale#10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Nov 3, 2025
[ Upstream commit 4840cea ]

The syzbot reported issue in __hfsplus_ext_cache_extent():

[   70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195022][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195530][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.195998][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.196458][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.196959][ T9350]  cont_write_begin+0x1000/0x1950
[   70.197416][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.197873][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.198374][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.198892][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.199393][ T9350]  vfs_write+0xb0f/0x14e0
[   70.199771][ T9350]  ksys_write+0x23e/0x490
[   70.200149][ T9350]  __x64_sys_write+0x97/0xf0
[   70.200570][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.201065][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.201506][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.202054][ T9350]
[   70.202279][ T9350] Uninit was created at:
[   70.202693][ T9350]  __kmalloc_noprof+0x621/0xf80
[   70.203149][ T9350]  hfsplus_find_init+0x8d/0x1d0
[   70.203602][ T9350]  hfsplus_file_extend+0x6ca/0x1cf0
[   70.204087][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.204561][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.205074][ T9350]  cont_write_begin+0x1000/0x1950
[   70.205547][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.206017][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.206519][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.207042][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.207552][ T9350]  vfs_write+0xb0f/0x14e0
[   70.207961][ T9350]  ksys_write+0x23e/0x490
[   70.208375][ T9350]  __x64_sys_write+0x97/0xf0
[   70.208810][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.209255][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.209680][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.210230][ T9350]
[   70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 Freescale#5
[   70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.212115][ T9350] =====================================================
[   70.212734][ T9350] Disabling lock debugging due to kernel taint
[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ...
[   70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G    B              6.12.0-rc5 Freescale#5
[   70.214679][ T9350] Tainted: [B]=BAD_PAGE
[   70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.215999][ T9350] Call Trace:
[   70.216309][ T9350]  <TASK>
[   70.216585][ T9350]  dump_stack_lvl+0x1fd/0x2b0
[   70.217025][ T9350]  dump_stack+0x1e/0x30
[   70.217421][ T9350]  panic+0x502/0xca0
[   70.217803][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0

[   70.218294][ Message fromT sy9350]  kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ...
 kernel
:[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [   70.220179][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
set ...
[   70.221254][ T9350]  ? __msan_warning+0x96/0x120
[   70.222066][ T9350]  ? __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.223023][ T9350]  ? hfsplus_file_extend+0x74f/0x1cf0
[   70.224120][ T9350]  ? hfsplus_get_block+0xe16/0x17b0
[   70.224946][ T9350]  ? __block_write_begin_int+0x962/0x2ce0
[   70.225756][ T9350]  ? cont_write_begin+0x1000/0x1950
[   70.226337][ T9350]  ? hfsplus_write_begin+0x85/0x130
[   70.226852][ T9350]  ? generic_perform_write+0x3e8/0x1060
[   70.227405][ T9350]  ? __generic_file_write_iter+0x215/0x460
[   70.227979][ T9350]  ? generic_file_write_iter+0x109/0x5e0
[   70.228540][ T9350]  ? vfs_write+0xb0f/0x14e0
[   70.228997][ T9350]  ? ksys_write+0x23e/0x490
[   70.229458][ T9350]  ? __x64_sys_write+0x97/0xf0
[   70.229939][ T9350]  ? x64_sys_call+0x3015/0x3cf0
[   70.230432][ T9350]  ? do_syscall_64+0xd9/0x1d0
[   70.230941][ T9350]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.231926][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.232738][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.233711][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.234516][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.235398][ T9350]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.236323][ T9350]  ? hfsplus_brec_find+0x218/0x9f0
[   70.237090][ T9350]  ? __pfx_hfs_find_rec_by_key+0x10/0x10
[   70.237938][ T9350]  ? __msan_instrument_asm_store+0xbf/0xf0
[   70.238827][ T9350]  ? __msan_metadata_ptr_for_store_4+0x27/0x40
[   70.239772][ T9350]  ? __hfsplus_ext_write_extent+0x536/0x620
[   70.240666][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.241175][ T9350]  __msan_warning+0x96/0x120
[   70.241645][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.242223][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.242748][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.243255][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.243878][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.244400][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.244967][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.245531][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.246079][ T9350]  cont_write_begin+0x1000/0x1950
[   70.246598][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.247105][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.247650][ T9350]  ? __pfx_hfsplus_write_begin+0x10/0x10
[   70.248211][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.248752][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.249314][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.249856][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.250487][ T9350]  vfs_write+0xb0f/0x14e0
[   70.250930][ T9350]  ? __pfx_generic_file_write_iter+0x10/0x10
[   70.251530][ T9350]  ksys_write+0x23e/0x490
[   70.251974][ T9350]  __x64_sys_write+0x97/0xf0
[   70.252450][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.252924][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.253384][ T9350]  ? irqentry_exit+0x16/0x60
[   70.253844][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.254430][ T9350] RIP: 0033:0x7f7a92adffc9
[   70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9
[   70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004
[   70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0
[   70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250
[   70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   70.261286][ T9350]  </TASK>
[   70.262026][ T9350] Kernel Offset: disabled

(gdb) l *__hfsplus_ext_cache_extent+0x7d0
0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168).
163		fd->key->ext.cnid = 0;
164		res = hfs_brec_find(fd, hfs_find_rec_by_key);
165		if (res && res != -ENOENT)
166			return res;
167		if (fd->key->ext.cnid != fd->search_key->ext.cnid ||
168		    fd->key->ext.fork_type != fd->search_key->ext.fork_type)
169			return -ENOENT;
170		if (fd->entrylength != sizeof(hfsplus_extent_rec))
171			return -EIO;
172		hfs_bnode_read(fd->bnode, extent, fd->entryoffset,

The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent():

res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
				block, HFSPLUS_IS_RSRC(inode) ?
					HFSPLUS_TYPE_RSRC :
					HFSPLUS_TYPE_DATA);

And if inode->i_ino could be equal to zero or any non-available CNID,
then hfs_brec_find() could not find the record in the tree. As a result,
fd->key could be compared with fd->search_key. But hfsplus_find_init()
uses kmalloc() for fd->key and fd->search_key allocation:

int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
{
<skipped>
        ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL);
        if (!ptr)
                return -ENOMEM;
        fd->search_key = ptr;
        fd->key = ptr + tree->max_key_len + 2;
<skipped>
}

Finally, fd->key is still not initialized if hfs_brec_find()
has found nothing.

This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.

Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Nov 3, 2025
[ Upstream commit 48918ca ]

The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    #6 0x565358f6682e in run_test_child builtin-test.c:344
    #7 0x565358ef7121 in start_command run-command.c:128
    #8 0x565358f67273 in start_test builtin-test.c:545
    #9 0x565358f6771d in __cmd_test builtin-test.c:647
    #10 0x565358f682bd in cmd_test builtin-test.c:849
    #11 0x565358ee5ded in run_builtin perf.c:349
    #12 0x565358ee6085 in handle_internal_command perf.c:401
    #13 0x565358ee61de in run_argv perf.c:448
    #14 0x565358ee6527 in main perf.c:555
    #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Nov 3, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  #7 [3800313fd60] crw_collect_info at c91905822
  #8 [3800313fe10] kthread at c90feb390
  #9 [3800313fe68] __ret_from_fork at c90f6aa64
  #10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
otavio pushed a commit that referenced this pull request Nov 3, 2025
[ Upstream commit 4840cea ]

The syzbot reported issue in __hfsplus_ext_cache_extent():

[   70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195022][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195530][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.195998][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.196458][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.196959][ T9350]  cont_write_begin+0x1000/0x1950
[   70.197416][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.197873][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.198374][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.198892][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.199393][ T9350]  vfs_write+0xb0f/0x14e0
[   70.199771][ T9350]  ksys_write+0x23e/0x490
[   70.200149][ T9350]  __x64_sys_write+0x97/0xf0
[   70.200570][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.201065][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.201506][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.202054][ T9350]
[   70.202279][ T9350] Uninit was created at:
[   70.202693][ T9350]  __kmalloc_noprof+0x621/0xf80
[   70.203149][ T9350]  hfsplus_find_init+0x8d/0x1d0
[   70.203602][ T9350]  hfsplus_file_extend+0x6ca/0x1cf0
[   70.204087][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.204561][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.205074][ T9350]  cont_write_begin+0x1000/0x1950
[   70.205547][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.206017][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.206519][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.207042][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.207552][ T9350]  vfs_write+0xb0f/0x14e0
[   70.207961][ T9350]  ksys_write+0x23e/0x490
[   70.208375][ T9350]  __x64_sys_write+0x97/0xf0
[   70.208810][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.209255][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.209680][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.210230][ T9350]
[   70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 #5
[   70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.212115][ T9350] =====================================================
[   70.212734][ T9350] Disabling lock debugging due to kernel taint
[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ...
[   70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G    B              6.12.0-rc5 #5
[   70.214679][ T9350] Tainted: [B]=BAD_PAGE
[   70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.215999][ T9350] Call Trace:
[   70.216309][ T9350]  <TASK>
[   70.216585][ T9350]  dump_stack_lvl+0x1fd/0x2b0
[   70.217025][ T9350]  dump_stack+0x1e/0x30
[   70.217421][ T9350]  panic+0x502/0xca0
[   70.217803][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0

[   70.218294][ Message fromT sy9350]  kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ...
 kernel
:[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [   70.220179][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
set ...
[   70.221254][ T9350]  ? __msan_warning+0x96/0x120
[   70.222066][ T9350]  ? __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.223023][ T9350]  ? hfsplus_file_extend+0x74f/0x1cf0
[   70.224120][ T9350]  ? hfsplus_get_block+0xe16/0x17b0
[   70.224946][ T9350]  ? __block_write_begin_int+0x962/0x2ce0
[   70.225756][ T9350]  ? cont_write_begin+0x1000/0x1950
[   70.226337][ T9350]  ? hfsplus_write_begin+0x85/0x130
[   70.226852][ T9350]  ? generic_perform_write+0x3e8/0x1060
[   70.227405][ T9350]  ? __generic_file_write_iter+0x215/0x460
[   70.227979][ T9350]  ? generic_file_write_iter+0x109/0x5e0
[   70.228540][ T9350]  ? vfs_write+0xb0f/0x14e0
[   70.228997][ T9350]  ? ksys_write+0x23e/0x490
[   70.229458][ T9350]  ? __x64_sys_write+0x97/0xf0
[   70.229939][ T9350]  ? x64_sys_call+0x3015/0x3cf0
[   70.230432][ T9350]  ? do_syscall_64+0xd9/0x1d0
[   70.230941][ T9350]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.231926][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.232738][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.233711][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.234516][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.235398][ T9350]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.236323][ T9350]  ? hfsplus_brec_find+0x218/0x9f0
[   70.237090][ T9350]  ? __pfx_hfs_find_rec_by_key+0x10/0x10
[   70.237938][ T9350]  ? __msan_instrument_asm_store+0xbf/0xf0
[   70.238827][ T9350]  ? __msan_metadata_ptr_for_store_4+0x27/0x40
[   70.239772][ T9350]  ? __hfsplus_ext_write_extent+0x536/0x620
[   70.240666][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.241175][ T9350]  __msan_warning+0x96/0x120
[   70.241645][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.242223][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.242748][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.243255][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.243878][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.244400][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.244967][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.245531][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.246079][ T9350]  cont_write_begin+0x1000/0x1950
[   70.246598][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.247105][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.247650][ T9350]  ? __pfx_hfsplus_write_begin+0x10/0x10
[   70.248211][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.248752][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.249314][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.249856][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.250487][ T9350]  vfs_write+0xb0f/0x14e0
[   70.250930][ T9350]  ? __pfx_generic_file_write_iter+0x10/0x10
[   70.251530][ T9350]  ksys_write+0x23e/0x490
[   70.251974][ T9350]  __x64_sys_write+0x97/0xf0
[   70.252450][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.252924][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.253384][ T9350]  ? irqentry_exit+0x16/0x60
[   70.253844][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.254430][ T9350] RIP: 0033:0x7f7a92adffc9
[   70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9
[   70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004
[   70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0
[   70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250
[   70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   70.261286][ T9350]  </TASK>
[   70.262026][ T9350] Kernel Offset: disabled

(gdb) l *__hfsplus_ext_cache_extent+0x7d0
0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168).
163		fd->key->ext.cnid = 0;
164		res = hfs_brec_find(fd, hfs_find_rec_by_key);
165		if (res && res != -ENOENT)
166			return res;
167		if (fd->key->ext.cnid != fd->search_key->ext.cnid ||
168		    fd->key->ext.fork_type != fd->search_key->ext.fork_type)
169			return -ENOENT;
170		if (fd->entrylength != sizeof(hfsplus_extent_rec))
171			return -EIO;
172		hfs_bnode_read(fd->bnode, extent, fd->entryoffset,

The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent():

res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
				block, HFSPLUS_IS_RSRC(inode) ?
					HFSPLUS_TYPE_RSRC :
					HFSPLUS_TYPE_DATA);

And if inode->i_ino could be equal to zero or any non-available CNID,
then hfs_brec_find() could not find the record in the tree. As a result,
fd->key could be compared with fd->search_key. But hfsplus_find_init()
uses kmalloc() for fd->key and fd->search_key allocation:

int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
{
<skipped>
        ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL);
        if (!ptr)
                return -ENOMEM;
        fd->search_key = ptr;
        fd->key = ptr + tree->max_key_len + 2;
<skipped>
}

Finally, fd->key is still not initialized if hfs_brec_find()
has found nothing.

This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.

Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
fedepell pushed a commit to fedepell/linux-fslc that referenced this pull request Dec 7, 2025
[ Upstream commit 38f5024 ]

With CONFIG_PROVE_RCU_LIST=y and by executing

  $ netcat -l --sctp &
  $ netcat --sctp localhost &
  $ ss --sctp

one can trigger the following Lockdep-RCU splat(s):

  WARNING: suspicious RCU usage
  6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 Not tainted
  -----------------------------
  net/sctp/diag.c:76 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  2 locks held by ss/215:
   #0: ffff9c740828bec0 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{4:4}, at: __netlink_dump_start+0x84/0x2b0
   Freescale#1: ffff9c7401d72cd0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_sock_dump+0x38/0x200

  stack backtrace:
  CPU: 0 UID: 0 PID: 215 Comm: ss Not tainted 6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x90
   lockdep_rcu_suspicious.cold+0x4e/0xa3
   inet_sctp_diag_fill.isra.0+0x4b1/0x5d0
   sctp_sock_dump+0x131/0x200
   sctp_transport_traverse_process+0x170/0x1b0
   ? __pfx_sctp_sock_filter+0x10/0x10
   ? __pfx_sctp_sock_dump+0x10/0x10
   sctp_diag_dump+0x103/0x140
   __inet_diag_dump+0x70/0xb0
   netlink_dump+0x148/0x490
   __netlink_dump_start+0x1f3/0x2b0
   inet_diag_handler_cmd+0xcd/0x100
   ? __pfx_inet_diag_dump_start+0x10/0x10
   ? __pfx_inet_diag_dump+0x10/0x10
   ? __pfx_inet_diag_dump_done+0x10/0x10
   sock_diag_rcv_msg+0x18e/0x320
   ? __pfx_sock_diag_rcv_msg+0x10/0x10
   netlink_rcv_skb+0x4d/0x100
   netlink_unicast+0x1d7/0x2b0
   netlink_sendmsg+0x203/0x450
   ____sys_sendmsg+0x30c/0x340
   ___sys_sendmsg+0x94/0xf0
   __sys_sendmsg+0x83/0xf0
   do_syscall_64+0xbb/0x390
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   ...
   </TASK>

Fixes: 8f840e4 ("sctp: add the sctp_diag.c file")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251028161506.3294376-2-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Dec 9, 2025
[ Upstream commit 38f5024 ]

With CONFIG_PROVE_RCU_LIST=y and by executing

  $ netcat -l --sctp &
  $ netcat --sctp localhost &
  $ ss --sctp

one can trigger the following Lockdep-RCU splat(s):

  WARNING: suspicious RCU usage
  6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 Not tainted
  -----------------------------
  net/sctp/diag.c:76 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  2 locks held by ss/215:
   #0: ffff9c740828bec0 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{4:4}, at: __netlink_dump_start+0x84/0x2b0
   Freescale#1: ffff9c7401d72cd0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_sock_dump+0x38/0x200

  stack backtrace:
  CPU: 0 UID: 0 PID: 215 Comm: ss Not tainted 6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x90
   lockdep_rcu_suspicious.cold+0x4e/0xa3
   inet_sctp_diag_fill.isra.0+0x4b1/0x5d0
   sctp_sock_dump+0x131/0x200
   sctp_transport_traverse_process+0x170/0x1b0
   ? __pfx_sctp_sock_filter+0x10/0x10
   ? __pfx_sctp_sock_dump+0x10/0x10
   sctp_diag_dump+0x103/0x140
   __inet_diag_dump+0x70/0xb0
   netlink_dump+0x148/0x490
   __netlink_dump_start+0x1f3/0x2b0
   inet_diag_handler_cmd+0xcd/0x100
   ? __pfx_inet_diag_dump_start+0x10/0x10
   ? __pfx_inet_diag_dump+0x10/0x10
   ? __pfx_inet_diag_dump_done+0x10/0x10
   sock_diag_rcv_msg+0x18e/0x320
   ? __pfx_sock_diag_rcv_msg+0x10/0x10
   netlink_rcv_skb+0x4d/0x100
   netlink_unicast+0x1d7/0x2b0
   netlink_sendmsg+0x203/0x450
   ____sys_sendmsg+0x30c/0x340
   ___sys_sendmsg+0x94/0xf0
   __sys_sendmsg+0x83/0xf0
   do_syscall_64+0xbb/0x390
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   ...
   </TASK>

Fixes: 8f840e4 ("sctp: add the sctp_diag.c file")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251028161506.3294376-2-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Dec 9, 2025
[ Upstream commit 84bbe32 ]

On completion of i915_vma_pin_ww(), a synchronous variant of
dma_fence_work_commit() is called.  When pinning a VMA to GGTT address
space on a Cherry View family processor, or on a Broxton generation SoC
with VTD enabled, i.e., when stop_machine() is then called from
intel_ggtt_bind_vma(), that can potentially lead to lock inversion among
reservation_ww and cpu_hotplug locks.

[86.861179] ======================================================
[86.861193] WARNING: possible circular locking dependency detected
[86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ Freescale#1 Tainted: G     U
[86.861226] ------------------------------------------------------
[86.861238] i915_module_loa/1432 is trying to acquire lock:
[86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50
[86.861290]
but task is already holding lock:
[86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.862233]
which lock already depends on the new lock.
[86.862251]
the existing dependency chain (in reverse order) is:
[86.862265]
-> Freescale#5 (reservation_ww_class_mutex){+.+.}-{3:3}:
[86.862292]        dma_resv_lockdep+0x19a/0x390
[86.862315]        do_one_initcall+0x60/0x3f0
[86.862334]        kernel_init_freeable+0x3cd/0x680
[86.862353]        kernel_init+0x1b/0x200
[86.862369]        ret_from_fork+0x47/0x70
[86.862383]        ret_from_fork_asm+0x1a/0x30
[86.862399]
-> Freescale#4 (reservation_ww_class_acquire){+.+.}-{0:0}:
[86.862425]        dma_resv_lockdep+0x178/0x390
[86.862440]        do_one_initcall+0x60/0x3f0
[86.862454]        kernel_init_freeable+0x3cd/0x680
[86.862470]        kernel_init+0x1b/0x200
[86.862482]        ret_from_fork+0x47/0x70
[86.862495]        ret_from_fork_asm+0x1a/0x30
[86.862509]
-> Freescale#3 (&mm->mmap_lock){++++}-{3:3}:
[86.862531]        down_read_killable+0x46/0x1e0
[86.862546]        lock_mm_and_find_vma+0xa2/0x280
[86.862561]        do_user_addr_fault+0x266/0x8e0
[86.862578]        exc_page_fault+0x8a/0x2f0
[86.862593]        asm_exc_page_fault+0x27/0x30
[86.862607]        filldir64+0xeb/0x180
[86.862620]        kernfs_fop_readdir+0x118/0x480
[86.862635]        iterate_dir+0xcf/0x2b0
[86.862648]        __x64_sys_getdents64+0x84/0x140
[86.862661]        x64_sys_call+0x1058/0x2660
[86.862675]        do_syscall_64+0x91/0xe90
[86.862689]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.862703]
-> Freescale#2 (&root->kernfs_rwsem){++++}-{3:3}:
[86.862725]        down_write+0x3e/0xf0
[86.862738]        kernfs_add_one+0x30/0x3c0
[86.862751]        kernfs_create_dir_ns+0x53/0xb0
[86.862765]        internal_create_group+0x134/0x4c0
[86.862779]        sysfs_create_group+0x13/0x20
[86.862792]        topology_add_dev+0x1d/0x30
[86.862806]        cpuhp_invoke_callback+0x4b5/0x850
[86.862822]        cpuhp_issue_call+0xbf/0x1f0
[86.862836]        __cpuhp_setup_state_cpuslocked+0x111/0x320
[86.862852]        __cpuhp_setup_state+0xb0/0x220
[86.862866]        topology_sysfs_init+0x30/0x50
[86.862879]        do_one_initcall+0x60/0x3f0
[86.862893]        kernel_init_freeable+0x3cd/0x680
[86.862908]        kernel_init+0x1b/0x200
[86.862921]        ret_from_fork+0x47/0x70
[86.862934]        ret_from_fork_asm+0x1a/0x30
[86.862947]
-> Freescale#1 (cpuhp_state_mutex){+.+.}-{3:3}:
[86.862969]        __mutex_lock+0xaa/0xed0
[86.862982]        mutex_lock_nested+0x1b/0x30
[86.862995]        __cpuhp_setup_state_cpuslocked+0x67/0x320
[86.863012]        __cpuhp_setup_state+0xb0/0x220
[86.863026]        page_alloc_init_cpuhp+0x2d/0x60
[86.863041]        mm_core_init+0x22/0x2d0
[86.863054]        start_kernel+0x576/0xbd0
[86.863068]        x86_64_start_reservations+0x18/0x30
[86.863084]        x86_64_start_kernel+0xbf/0x110
[86.863098]        common_startup_64+0x13e/0x141
[86.863114]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[86.863135]        __lock_acquire+0x1635/0x2810
[86.863152]        lock_acquire+0xc4/0x2f0
[86.863166]        cpus_read_lock+0x41/0x100
[86.863180]        stop_machine+0x1c/0x50
[86.863194]        bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.863987]        intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.864735]        __vma_bind+0x55/0x70 [i915]
[86.865510]        fence_work+0x26/0xa0 [i915]
[86.866248]        fence_notify+0xa1/0x140 [i915]
[86.866983]        __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.867719]        i915_sw_fence_commit+0x39/0x60 [i915]
[86.868453]        i915_vma_pin_ww+0x462/0x1360 [i915]
[86.869228]        i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.870001]        initial_plane_vma+0x307/0x840 [i915]
[86.870774]        intel_initial_plane_config+0x33f/0x670 [i915]
[86.871546]        intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.872330]        i915_driver_probe+0x7fa/0xe80 [i915]
[86.873057]        i915_pci_probe+0xe6/0x220 [i915]
[86.873782]        local_pci_probe+0x47/0xb0
[86.873802]        pci_device_probe+0xf3/0x260
[86.873817]        really_probe+0xf1/0x3c0
[86.873833]        __driver_probe_device+0x8c/0x180
[86.873848]        driver_probe_device+0x24/0xd0
[86.873862]        __driver_attach+0x10f/0x220
[86.873876]        bus_for_each_dev+0x7f/0xe0
[86.873892]        driver_attach+0x1e/0x30
[86.873904]        bus_add_driver+0x151/0x290
[86.873917]        driver_register+0x5e/0x130
[86.873931]        __pci_register_driver+0x7d/0x90
[86.873945]        i915_pci_register_driver+0x23/0x30 [i915]
[86.874678]        i915_init+0x37/0x120 [i915]
[86.875347]        do_one_initcall+0x60/0x3f0
[86.875369]        do_init_module+0x97/0x2a0
[86.875385]        load_module+0x2c54/0x2d80
[86.875398]        init_module_from_file+0x96/0xe0
[86.875413]        idempotent_init_module+0x117/0x330
[86.875426]        __x64_sys_finit_module+0x77/0x100
[86.875440]        x64_sys_call+0x24de/0x2660
[86.875454]        do_syscall_64+0x91/0xe90
[86.875470]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.875486]
other info that might help us debug this:
[86.875502] Chain exists of:
  cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[86.875539]  Possible unsafe locking scenario:
[86.875552]        CPU0                    CPU1
[86.875563]        ----                    ----
[86.875573]   lock(reservation_ww_class_mutex);
[86.875588]                                lock(reservation_ww_class_acquire);
[86.875606]                                lock(reservation_ww_class_mutex);
[86.875624]   rlock(cpu_hotplug_lock);
[86.875637]
 *** DEADLOCK ***
[86.875650] 3 locks held by i915_module_loa/1432:
[86.875663]  #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220
[86.875699]  Freescale#1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.876512]  Freescale#2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.877305]
stack backtrace:
[86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G     U              6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ Freescale#1 PREEMPT(voluntary)
[86.877334] Tainted: [U]=USER
[86.877336] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020
[86.877339] Call Trace:
[86.877344]  <TASK>
[86.877353]  dump_stack_lvl+0x91/0xf0
[86.877364]  dump_stack+0x10/0x20
[86.877369]  print_circular_bug+0x285/0x360
[86.877379]  check_noncircular+0x135/0x150
[86.877390]  __lock_acquire+0x1635/0x2810
[86.877403]  lock_acquire+0xc4/0x2f0
[86.877408]  ? stop_machine+0x1c/0x50
[86.877422]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878173]  cpus_read_lock+0x41/0x100
[86.878182]  ? stop_machine+0x1c/0x50
[86.878191]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878916]  stop_machine+0x1c/0x50
[86.878927]  bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.879652]  intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.880375]  __vma_bind+0x55/0x70 [i915]
[86.881133]  fence_work+0x26/0xa0 [i915]
[86.881851]  fence_notify+0xa1/0x140 [i915]
[86.882566]  __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.883286]  i915_sw_fence_commit+0x39/0x60 [i915]
[86.884003]  i915_vma_pin_ww+0x462/0x1360 [i915]
[86.884756]  ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915]
[86.885513]  i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.886281]  initial_plane_vma+0x307/0x840 [i915]
[86.887049]  intel_initial_plane_config+0x33f/0x670 [i915]
[86.887819]  intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.888587]  i915_driver_probe+0x7fa/0xe80 [i915]
[86.889293]  ? mutex_unlock+0x12/0x20
[86.889301]  ? drm_privacy_screen_get+0x171/0x190
[86.889308]  ? acpi_dev_found+0x66/0x80
[86.889321]  i915_pci_probe+0xe6/0x220 [i915]
[86.890038]  local_pci_probe+0x47/0xb0
[86.890049]  pci_device_probe+0xf3/0x260
[86.890058]  really_probe+0xf1/0x3c0
[86.890067]  __driver_probe_device+0x8c/0x180
[86.890072]  driver_probe_device+0x24/0xd0
[86.890078]  __driver_attach+0x10f/0x220
[86.890083]  ? __pfx___driver_attach+0x10/0x10
[86.890088]  bus_for_each_dev+0x7f/0xe0
[86.890097]  driver_attach+0x1e/0x30
[86.890101]  bus_add_driver+0x151/0x290
[86.890107]  driver_register+0x5e/0x130
[86.890113]  __pci_register_driver+0x7d/0x90
[86.890119]  i915_pci_register_driver+0x23/0x30 [i915]
[86.890833]  i915_init+0x37/0x120 [i915]
[86.891482]  ? __pfx_i915_init+0x10/0x10 [i915]
[86.892135]  do_one_initcall+0x60/0x3f0
[86.892145]  ? __kmalloc_cache_noprof+0x33f/0x470
[86.892157]  do_init_module+0x97/0x2a0
[86.892164]  load_module+0x2c54/0x2d80
[86.892168]  ? __kernel_read+0x15c/0x300
[86.892185]  ? kernel_read_file+0x2b1/0x320
[86.892195]  init_module_from_file+0x96/0xe0
[86.892199]  ? init_module_from_file+0x96/0xe0
[86.892211]  idempotent_init_module+0x117/0x330
[86.892224]  __x64_sys_finit_module+0x77/0x100
[86.892230]  x64_sys_call+0x24de/0x2660
[86.892236]  do_syscall_64+0x91/0xe90
[86.892243]  ? irqentry_exit+0x77/0xb0
[86.892249]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[86.892256]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.892261] RIP: 0033:0x7303e1b2725d
[86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d
[86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c
[86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80
[86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0
[86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710
[86.892304]  </TASK>

Call asynchronous variant of dma_fence_work_commit() in that case.

v3: Provide more verbose in-line comment (Andi),
  - mention target environments in commit message.

Fixes: 7d1c261 ("drm/i915: Take reservation lock around i915_vma_pin.")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985
Cc: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Acked-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20251023082925.351307-6-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 648ef13)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
efecanicoz-atlas pushed a commit to efecanicoz-atlas/linux-fslc that referenced this pull request Dec 9, 2025
commit 9d274c1 upstream.

We have been seeing crashes on duplicate keys in
btrfs_set_item_key_safe():

  BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:2620!
  invalid opcode: 0000 [Freescale#1] PREEMPT SMP PTI
  CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 Freescale#6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]

With the following stack trace:

  #0  btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4)
  Freescale#1  btrfs_drop_extents (fs/btrfs/file.c:411:4)
  Freescale#2  log_one_extent (fs/btrfs/tree-log.c:4732:9)
  Freescale#3  btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9)
  Freescale#4  btrfs_log_inode (fs/btrfs/tree-log.c:6626:9)
  Freescale#5  btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8)
  Freescale#6  btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8)
  Freescale#7  btrfs_sync_file (fs/btrfs/file.c:1933:8)
  Freescale#8  vfs_fsync_range (fs/sync.c:188:9)
  Freescale#9  vfs_fsync (fs/sync.c:202:9)
  Freescale#10 do_fsync (fs/sync.c:212:9)
  Freescale#11 __do_sys_fdatasync (fs/sync.c:225:9)
  Freescale#12 __se_sys_fdatasync (fs/sync.c:223:1)
  Freescale#13 __x64_sys_fdatasync (fs/sync.c:223:1)
  Freescale#14 do_syscall_x64 (arch/x86/entry/common.c:52:14)
  Freescale#15 do_syscall_64 (arch/x86/entry/common.c:83:7)
  Freescale#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)

So we're logging a changed extent from fsync, which is splitting an
extent in the log tree. But this split part already exists in the tree,
triggering the BUG().

This is the state of the log tree at the time of the crash, dumped with
drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py)
to get more details than btrfs_print_leaf() gives us:

  >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"])
  leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610
  leaf 33439744 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
          item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160
                  generation 7 transid 9 size 8192 nbytes 8473563889606862198
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 204 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417704.983333333 (2024-05-22 15:41:44)
                  mtime 1716417704.983333333 (2024-05-22 15:41:44)
                  otime 17592186044416.000000000 (559444-03-08 01:40:16)
          item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13
                  index 195 namelen 3 name: 193
          item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 4096 ram 12288
                  extent compression 0 (none)
          item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 4096 nr 8192
          item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096
  ...

So the real problem happened earlier: notice that items 4 (4k-12k) and 5
(8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and
item 5 starts at i_size.

Here is the state of the filesystem tree at the time of the crash:

  >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root
  >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0))
  >>> print_extent_buffer(nodes[0])
  leaf 30425088 level 0 items 184 generation 9 owner 5
  leaf 30425088 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
  	...
          item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160
                  generation 7 transid 7 size 4096 nbytes 12288
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 6 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417703.220000000 (2024-05-22 15:41:43)
                  mtime 1716417703.220000000 (2024-05-22 15:41:43)
                  otime 1716417703.220000000 (2024-05-22 15:41:43)
          item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13
                  index 195 namelen 3 name: 193
          item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 8192 ram 12288
                  extent compression 0 (none)
          item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096

Item 5 in the log tree corresponds to item 183 in the filesystem tree,
but nothing matches item 4. Furthermore, item 183 is the last item in
the leaf.

btrfs_log_prealloc_extents() is responsible for logging prealloc extents
beyond i_size. It first truncates any previously logged prealloc extents
that start beyond i_size. Then, it walks the filesystem tree and copies
the prealloc extent items to the log tree.

If it hits the end of a leaf, then it calls btrfs_next_leaf(), which
unlocks the tree and does another search. However, while the filesystem
tree is unlocked, an ordered extent completion may modify the tree. In
particular, it may insert an extent item that overlaps with an extent
item that was already copied to the log tree.

This may manifest in several ways depending on the exact scenario,
including an EEXIST error that is silently translated to a full sync,
overlapping items in the log tree, or this crash. This particular crash
is triggered by the following sequence of events:

- Initially, the file has i_size=4k, a regular extent from 0-4k, and a
  prealloc extent beyond i_size from 4k-12k. The prealloc extent item is
  the last item in its B-tree leaf.
- The file is fsync'd, which copies its inode item and both extent items
  to the log tree.
- An xattr is set on the file, which sets the
  BTRFS_INODE_COPY_EVERYTHING flag.
- The range 4k-8k in the file is written using direct I/O. i_size is
  extended to 8k, but the ordered extent is still in flight.
- The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this
  calls copy_inode_items_to_log(), which calls
  btrfs_log_prealloc_extents().
- btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the
  filesystem tree. Since it starts before i_size, it skips it. Since it
  is the last item in its B-tree leaf, it calls btrfs_next_leaf().
- btrfs_next_leaf() unlocks the path.
- The ordered extent completion runs, which converts the 4k-8k part of
  the prealloc extent to written and inserts the remaining prealloc part
  from 8k-12k.
- btrfs_next_leaf() does a search and finds the new prealloc extent
  8k-12k.
- btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into
  the log tree. Note that it overlaps with the 4k-12k prealloc extent
  that was copied to the log tree by the first fsync.
- fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k
  extent that was written.
- This tries to drop the range 4k-8k in the log tree, which requires
  adjusting the start of the 4k-12k prealloc extent in the log tree to
  8k.
- btrfs_set_item_key_safe() sees that there is already an extent
  starting at 8k in the log tree and calls BUG().

Fix this by detecting when we're about to insert an overlapping file
extent item in the log tree and truncating the part that would overlap.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ivitro pushed a commit to ivitro/linux-fslc that referenced this pull request Dec 10, 2025
[ Upstream commit 38f5024 ]

With CONFIG_PROVE_RCU_LIST=y and by executing

  $ netcat -l --sctp &
  $ netcat --sctp localhost &
  $ ss --sctp

one can trigger the following Lockdep-RCU splat(s):

  WARNING: suspicious RCU usage
  6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 Not tainted
  -----------------------------
  net/sctp/diag.c:76 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  2 locks held by ss/215:
   #0: ffff9c740828bec0 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{4:4}, at: __netlink_dump_start+0x84/0x2b0
   Freescale#1: ffff9c7401d72cd0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_sock_dump+0x38/0x200

  stack backtrace:
  CPU: 0 UID: 0 PID: 215 Comm: ss Not tainted 6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x90
   lockdep_rcu_suspicious.cold+0x4e/0xa3
   inet_sctp_diag_fill.isra.0+0x4b1/0x5d0
   sctp_sock_dump+0x131/0x200
   sctp_transport_traverse_process+0x170/0x1b0
   ? __pfx_sctp_sock_filter+0x10/0x10
   ? __pfx_sctp_sock_dump+0x10/0x10
   sctp_diag_dump+0x103/0x140
   __inet_diag_dump+0x70/0xb0
   netlink_dump+0x148/0x490
   __netlink_dump_start+0x1f3/0x2b0
   inet_diag_handler_cmd+0xcd/0x100
   ? __pfx_inet_diag_dump_start+0x10/0x10
   ? __pfx_inet_diag_dump+0x10/0x10
   ? __pfx_inet_diag_dump_done+0x10/0x10
   sock_diag_rcv_msg+0x18e/0x320
   ? __pfx_sock_diag_rcv_msg+0x10/0x10
   netlink_rcv_skb+0x4d/0x100
   netlink_unicast+0x1d7/0x2b0
   netlink_sendmsg+0x203/0x450
   ____sys_sendmsg+0x30c/0x340
   ___sys_sendmsg+0x94/0xf0
   __sys_sendmsg+0x83/0xf0
   do_syscall_64+0xbb/0x390
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   ...
   </TASK>

Fixes: 8f840e4 ("sctp: add the sctp_diag.c file")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251028161506.3294376-2-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ivitro pushed a commit to ivitro/linux-fslc that referenced this pull request Dec 10, 2025
[ Upstream commit 84bbe32 ]

On completion of i915_vma_pin_ww(), a synchronous variant of
dma_fence_work_commit() is called.  When pinning a VMA to GGTT address
space on a Cherry View family processor, or on a Broxton generation SoC
with VTD enabled, i.e., when stop_machine() is then called from
intel_ggtt_bind_vma(), that can potentially lead to lock inversion among
reservation_ww and cpu_hotplug locks.

[86.861179] ======================================================
[86.861193] WARNING: possible circular locking dependency detected
[86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ Freescale#1 Tainted: G     U
[86.861226] ------------------------------------------------------
[86.861238] i915_module_loa/1432 is trying to acquire lock:
[86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50
[86.861290]
but task is already holding lock:
[86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.862233]
which lock already depends on the new lock.
[86.862251]
the existing dependency chain (in reverse order) is:
[86.862265]
-> Freescale#5 (reservation_ww_class_mutex){+.+.}-{3:3}:
[86.862292]        dma_resv_lockdep+0x19a/0x390
[86.862315]        do_one_initcall+0x60/0x3f0
[86.862334]        kernel_init_freeable+0x3cd/0x680
[86.862353]        kernel_init+0x1b/0x200
[86.862369]        ret_from_fork+0x47/0x70
[86.862383]        ret_from_fork_asm+0x1a/0x30
[86.862399]
-> Freescale#4 (reservation_ww_class_acquire){+.+.}-{0:0}:
[86.862425]        dma_resv_lockdep+0x178/0x390
[86.862440]        do_one_initcall+0x60/0x3f0
[86.862454]        kernel_init_freeable+0x3cd/0x680
[86.862470]        kernel_init+0x1b/0x200
[86.862482]        ret_from_fork+0x47/0x70
[86.862495]        ret_from_fork_asm+0x1a/0x30
[86.862509]
-> Freescale#3 (&mm->mmap_lock){++++}-{3:3}:
[86.862531]        down_read_killable+0x46/0x1e0
[86.862546]        lock_mm_and_find_vma+0xa2/0x280
[86.862561]        do_user_addr_fault+0x266/0x8e0
[86.862578]        exc_page_fault+0x8a/0x2f0
[86.862593]        asm_exc_page_fault+0x27/0x30
[86.862607]        filldir64+0xeb/0x180
[86.862620]        kernfs_fop_readdir+0x118/0x480
[86.862635]        iterate_dir+0xcf/0x2b0
[86.862648]        __x64_sys_getdents64+0x84/0x140
[86.862661]        x64_sys_call+0x1058/0x2660
[86.862675]        do_syscall_64+0x91/0xe90
[86.862689]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.862703]
-> Freescale#2 (&root->kernfs_rwsem){++++}-{3:3}:
[86.862725]        down_write+0x3e/0xf0
[86.862738]        kernfs_add_one+0x30/0x3c0
[86.862751]        kernfs_create_dir_ns+0x53/0xb0
[86.862765]        internal_create_group+0x134/0x4c0
[86.862779]        sysfs_create_group+0x13/0x20
[86.862792]        topology_add_dev+0x1d/0x30
[86.862806]        cpuhp_invoke_callback+0x4b5/0x850
[86.862822]        cpuhp_issue_call+0xbf/0x1f0
[86.862836]        __cpuhp_setup_state_cpuslocked+0x111/0x320
[86.862852]        __cpuhp_setup_state+0xb0/0x220
[86.862866]        topology_sysfs_init+0x30/0x50
[86.862879]        do_one_initcall+0x60/0x3f0
[86.862893]        kernel_init_freeable+0x3cd/0x680
[86.862908]        kernel_init+0x1b/0x200
[86.862921]        ret_from_fork+0x47/0x70
[86.862934]        ret_from_fork_asm+0x1a/0x30
[86.862947]
-> Freescale#1 (cpuhp_state_mutex){+.+.}-{3:3}:
[86.862969]        __mutex_lock+0xaa/0xed0
[86.862982]        mutex_lock_nested+0x1b/0x30
[86.862995]        __cpuhp_setup_state_cpuslocked+0x67/0x320
[86.863012]        __cpuhp_setup_state+0xb0/0x220
[86.863026]        page_alloc_init_cpuhp+0x2d/0x60
[86.863041]        mm_core_init+0x22/0x2d0
[86.863054]        start_kernel+0x576/0xbd0
[86.863068]        x86_64_start_reservations+0x18/0x30
[86.863084]        x86_64_start_kernel+0xbf/0x110
[86.863098]        common_startup_64+0x13e/0x141
[86.863114]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[86.863135]        __lock_acquire+0x1635/0x2810
[86.863152]        lock_acquire+0xc4/0x2f0
[86.863166]        cpus_read_lock+0x41/0x100
[86.863180]        stop_machine+0x1c/0x50
[86.863194]        bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.863987]        intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.864735]        __vma_bind+0x55/0x70 [i915]
[86.865510]        fence_work+0x26/0xa0 [i915]
[86.866248]        fence_notify+0xa1/0x140 [i915]
[86.866983]        __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.867719]        i915_sw_fence_commit+0x39/0x60 [i915]
[86.868453]        i915_vma_pin_ww+0x462/0x1360 [i915]
[86.869228]        i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.870001]        initial_plane_vma+0x307/0x840 [i915]
[86.870774]        intel_initial_plane_config+0x33f/0x670 [i915]
[86.871546]        intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.872330]        i915_driver_probe+0x7fa/0xe80 [i915]
[86.873057]        i915_pci_probe+0xe6/0x220 [i915]
[86.873782]        local_pci_probe+0x47/0xb0
[86.873802]        pci_device_probe+0xf3/0x260
[86.873817]        really_probe+0xf1/0x3c0
[86.873833]        __driver_probe_device+0x8c/0x180
[86.873848]        driver_probe_device+0x24/0xd0
[86.873862]        __driver_attach+0x10f/0x220
[86.873876]        bus_for_each_dev+0x7f/0xe0
[86.873892]        driver_attach+0x1e/0x30
[86.873904]        bus_add_driver+0x151/0x290
[86.873917]        driver_register+0x5e/0x130
[86.873931]        __pci_register_driver+0x7d/0x90
[86.873945]        i915_pci_register_driver+0x23/0x30 [i915]
[86.874678]        i915_init+0x37/0x120 [i915]
[86.875347]        do_one_initcall+0x60/0x3f0
[86.875369]        do_init_module+0x97/0x2a0
[86.875385]        load_module+0x2c54/0x2d80
[86.875398]        init_module_from_file+0x96/0xe0
[86.875413]        idempotent_init_module+0x117/0x330
[86.875426]        __x64_sys_finit_module+0x77/0x100
[86.875440]        x64_sys_call+0x24de/0x2660
[86.875454]        do_syscall_64+0x91/0xe90
[86.875470]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.875486]
other info that might help us debug this:
[86.875502] Chain exists of:
  cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[86.875539]  Possible unsafe locking scenario:
[86.875552]        CPU0                    CPU1
[86.875563]        ----                    ----
[86.875573]   lock(reservation_ww_class_mutex);
[86.875588]                                lock(reservation_ww_class_acquire);
[86.875606]                                lock(reservation_ww_class_mutex);
[86.875624]   rlock(cpu_hotplug_lock);
[86.875637]
 *** DEADLOCK ***
[86.875650] 3 locks held by i915_module_loa/1432:
[86.875663]  #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220
[86.875699]  Freescale#1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.876512]  Freescale#2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.877305]
stack backtrace:
[86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G     U              6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ Freescale#1 PREEMPT(voluntary)
[86.877334] Tainted: [U]=USER
[86.877336] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020
[86.877339] Call Trace:
[86.877344]  <TASK>
[86.877353]  dump_stack_lvl+0x91/0xf0
[86.877364]  dump_stack+0x10/0x20
[86.877369]  print_circular_bug+0x285/0x360
[86.877379]  check_noncircular+0x135/0x150
[86.877390]  __lock_acquire+0x1635/0x2810
[86.877403]  lock_acquire+0xc4/0x2f0
[86.877408]  ? stop_machine+0x1c/0x50
[86.877422]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878173]  cpus_read_lock+0x41/0x100
[86.878182]  ? stop_machine+0x1c/0x50
[86.878191]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878916]  stop_machine+0x1c/0x50
[86.878927]  bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.879652]  intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.880375]  __vma_bind+0x55/0x70 [i915]
[86.881133]  fence_work+0x26/0xa0 [i915]
[86.881851]  fence_notify+0xa1/0x140 [i915]
[86.882566]  __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.883286]  i915_sw_fence_commit+0x39/0x60 [i915]
[86.884003]  i915_vma_pin_ww+0x462/0x1360 [i915]
[86.884756]  ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915]
[86.885513]  i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.886281]  initial_plane_vma+0x307/0x840 [i915]
[86.887049]  intel_initial_plane_config+0x33f/0x670 [i915]
[86.887819]  intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.888587]  i915_driver_probe+0x7fa/0xe80 [i915]
[86.889293]  ? mutex_unlock+0x12/0x20
[86.889301]  ? drm_privacy_screen_get+0x171/0x190
[86.889308]  ? acpi_dev_found+0x66/0x80
[86.889321]  i915_pci_probe+0xe6/0x220 [i915]
[86.890038]  local_pci_probe+0x47/0xb0
[86.890049]  pci_device_probe+0xf3/0x260
[86.890058]  really_probe+0xf1/0x3c0
[86.890067]  __driver_probe_device+0x8c/0x180
[86.890072]  driver_probe_device+0x24/0xd0
[86.890078]  __driver_attach+0x10f/0x220
[86.890083]  ? __pfx___driver_attach+0x10/0x10
[86.890088]  bus_for_each_dev+0x7f/0xe0
[86.890097]  driver_attach+0x1e/0x30
[86.890101]  bus_add_driver+0x151/0x290
[86.890107]  driver_register+0x5e/0x130
[86.890113]  __pci_register_driver+0x7d/0x90
[86.890119]  i915_pci_register_driver+0x23/0x30 [i915]
[86.890833]  i915_init+0x37/0x120 [i915]
[86.891482]  ? __pfx_i915_init+0x10/0x10 [i915]
[86.892135]  do_one_initcall+0x60/0x3f0
[86.892145]  ? __kmalloc_cache_noprof+0x33f/0x470
[86.892157]  do_init_module+0x97/0x2a0
[86.892164]  load_module+0x2c54/0x2d80
[86.892168]  ? __kernel_read+0x15c/0x300
[86.892185]  ? kernel_read_file+0x2b1/0x320
[86.892195]  init_module_from_file+0x96/0xe0
[86.892199]  ? init_module_from_file+0x96/0xe0
[86.892211]  idempotent_init_module+0x117/0x330
[86.892224]  __x64_sys_finit_module+0x77/0x100
[86.892230]  x64_sys_call+0x24de/0x2660
[86.892236]  do_syscall_64+0x91/0xe90
[86.892243]  ? irqentry_exit+0x77/0xb0
[86.892249]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[86.892256]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.892261] RIP: 0033:0x7303e1b2725d
[86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d
[86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c
[86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80
[86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0
[86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710
[86.892304]  </TASK>

Call asynchronous variant of dma_fence_work_commit() in that case.

v3: Provide more verbose in-line comment (Andi),
  - mention target environments in commit message.

Fixes: 7d1c261 ("drm/i915: Take reservation lock around i915_vma_pin.")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985
Cc: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Acked-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20251023082925.351307-6-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 648ef13)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
fedepell pushed a commit to fedepell/linux-fslc that referenced this pull request Dec 11, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  Freescale#1 [3800313fb88] pci_remove_bus_device at c915105ba
  Freescale#2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  Freescale#3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  Freescale#4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  Freescale#5 [3800313fca0] __zpci_event_availability at c90fb3dca
  Freescale#6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  Freescale#7 [3800313fd60] crw_collect_info at c91905822
  Freescale#8 [3800313fe10] kthread at c90feb390
  Freescale#9 [3800313fe68] __ret_from_fork at c90f6aa64
  Freescale#10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fedepell pushed a commit to fedepell/linux-fslc that referenced this pull request Dec 11, 2025
[ Upstream commit 4840cea ]

The syzbot reported issue in __hfsplus_ext_cache_extent():

[   70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195022][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195530][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.195998][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.196458][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.196959][ T9350]  cont_write_begin+0x1000/0x1950
[   70.197416][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.197873][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.198374][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.198892][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.199393][ T9350]  vfs_write+0xb0f/0x14e0
[   70.199771][ T9350]  ksys_write+0x23e/0x490
[   70.200149][ T9350]  __x64_sys_write+0x97/0xf0
[   70.200570][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.201065][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.201506][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.202054][ T9350]
[   70.202279][ T9350] Uninit was created at:
[   70.202693][ T9350]  __kmalloc_noprof+0x621/0xf80
[   70.203149][ T9350]  hfsplus_find_init+0x8d/0x1d0
[   70.203602][ T9350]  hfsplus_file_extend+0x6ca/0x1cf0
[   70.204087][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.204561][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.205074][ T9350]  cont_write_begin+0x1000/0x1950
[   70.205547][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.206017][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.206519][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.207042][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.207552][ T9350]  vfs_write+0xb0f/0x14e0
[   70.207961][ T9350]  ksys_write+0x23e/0x490
[   70.208375][ T9350]  __x64_sys_write+0x97/0xf0
[   70.208810][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.209255][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.209680][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.210230][ T9350]
[   70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 Freescale#5
[   70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.212115][ T9350] =====================================================
[   70.212734][ T9350] Disabling lock debugging due to kernel taint
[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ...
[   70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G    B              6.12.0-rc5 Freescale#5
[   70.214679][ T9350] Tainted: [B]=BAD_PAGE
[   70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.215999][ T9350] Call Trace:
[   70.216309][ T9350]  <TASK>
[   70.216585][ T9350]  dump_stack_lvl+0x1fd/0x2b0
[   70.217025][ T9350]  dump_stack+0x1e/0x30
[   70.217421][ T9350]  panic+0x502/0xca0
[   70.217803][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0

[   70.218294][ Message fromT sy9350]  kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ...
 kernel
:[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [   70.220179][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
set ...
[   70.221254][ T9350]  ? __msan_warning+0x96/0x120
[   70.222066][ T9350]  ? __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.223023][ T9350]  ? hfsplus_file_extend+0x74f/0x1cf0
[   70.224120][ T9350]  ? hfsplus_get_block+0xe16/0x17b0
[   70.224946][ T9350]  ? __block_write_begin_int+0x962/0x2ce0
[   70.225756][ T9350]  ? cont_write_begin+0x1000/0x1950
[   70.226337][ T9350]  ? hfsplus_write_begin+0x85/0x130
[   70.226852][ T9350]  ? generic_perform_write+0x3e8/0x1060
[   70.227405][ T9350]  ? __generic_file_write_iter+0x215/0x460
[   70.227979][ T9350]  ? generic_file_write_iter+0x109/0x5e0
[   70.228540][ T9350]  ? vfs_write+0xb0f/0x14e0
[   70.228997][ T9350]  ? ksys_write+0x23e/0x490
[   70.229458][ T9350]  ? __x64_sys_write+0x97/0xf0
[   70.229939][ T9350]  ? x64_sys_call+0x3015/0x3cf0
[   70.230432][ T9350]  ? do_syscall_64+0xd9/0x1d0
[   70.230941][ T9350]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.231926][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.232738][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.233711][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.234516][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.235398][ T9350]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.236323][ T9350]  ? hfsplus_brec_find+0x218/0x9f0
[   70.237090][ T9350]  ? __pfx_hfs_find_rec_by_key+0x10/0x10
[   70.237938][ T9350]  ? __msan_instrument_asm_store+0xbf/0xf0
[   70.238827][ T9350]  ? __msan_metadata_ptr_for_store_4+0x27/0x40
[   70.239772][ T9350]  ? __hfsplus_ext_write_extent+0x536/0x620
[   70.240666][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.241175][ T9350]  __msan_warning+0x96/0x120
[   70.241645][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.242223][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.242748][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.243255][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.243878][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.244400][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.244967][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.245531][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.246079][ T9350]  cont_write_begin+0x1000/0x1950
[   70.246598][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.247105][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.247650][ T9350]  ? __pfx_hfsplus_write_begin+0x10/0x10
[   70.248211][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.248752][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.249314][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.249856][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.250487][ T9350]  vfs_write+0xb0f/0x14e0
[   70.250930][ T9350]  ? __pfx_generic_file_write_iter+0x10/0x10
[   70.251530][ T9350]  ksys_write+0x23e/0x490
[   70.251974][ T9350]  __x64_sys_write+0x97/0xf0
[   70.252450][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.252924][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.253384][ T9350]  ? irqentry_exit+0x16/0x60
[   70.253844][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.254430][ T9350] RIP: 0033:0x7f7a92adffc9
[   70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9
[   70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004
[   70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0
[   70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250
[   70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   70.261286][ T9350]  </TASK>
[   70.262026][ T9350] Kernel Offset: disabled

(gdb) l *__hfsplus_ext_cache_extent+0x7d0
0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168).
163		fd->key->ext.cnid = 0;
164		res = hfs_brec_find(fd, hfs_find_rec_by_key);
165		if (res && res != -ENOENT)
166			return res;
167		if (fd->key->ext.cnid != fd->search_key->ext.cnid ||
168		    fd->key->ext.fork_type != fd->search_key->ext.fork_type)
169			return -ENOENT;
170		if (fd->entrylength != sizeof(hfsplus_extent_rec))
171			return -EIO;
172		hfs_bnode_read(fd->bnode, extent, fd->entryoffset,

The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent():

res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
				block, HFSPLUS_IS_RSRC(inode) ?
					HFSPLUS_TYPE_RSRC :
					HFSPLUS_TYPE_DATA);

And if inode->i_ino could be equal to zero or any non-available CNID,
then hfs_brec_find() could not find the record in the tree. As a result,
fd->key could be compared with fd->search_key. But hfsplus_find_init()
uses kmalloc() for fd->key and fd->search_key allocation:

int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
{
<skipped>
        ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL);
        if (!ptr)
                return -ENOMEM;
        fd->search_key = ptr;
        fd->key = ptr + tree->max_key_len + 2;
<skipped>
}

Finally, fd->key is still not initialized if hfs_brec_find()
has found nothing.

This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.

Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
fedepell pushed a commit to fedepell/linux-fslc that referenced this pull request Dec 11, 2025
[ Upstream commit 38f5024 ]

With CONFIG_PROVE_RCU_LIST=y and by executing

  $ netcat -l --sctp &
  $ netcat --sctp localhost &
  $ ss --sctp

one can trigger the following Lockdep-RCU splat(s):

  WARNING: suspicious RCU usage
  6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 Not tainted
  -----------------------------
  net/sctp/diag.c:76 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  2 locks held by ss/215:
   #0: ffff9c740828bec0 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{4:4}, at: __netlink_dump_start+0x84/0x2b0
   Freescale#1: ffff9c7401d72cd0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_sock_dump+0x38/0x200

  stack backtrace:
  CPU: 0 UID: 0 PID: 215 Comm: ss Not tainted 6.18.0-rc1-00093-g7f864458e9a6 Freescale#5 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x90
   lockdep_rcu_suspicious.cold+0x4e/0xa3
   inet_sctp_diag_fill.isra.0+0x4b1/0x5d0
   sctp_sock_dump+0x131/0x200
   sctp_transport_traverse_process+0x170/0x1b0
   ? __pfx_sctp_sock_filter+0x10/0x10
   ? __pfx_sctp_sock_dump+0x10/0x10
   sctp_diag_dump+0x103/0x140
   __inet_diag_dump+0x70/0xb0
   netlink_dump+0x148/0x490
   __netlink_dump_start+0x1f3/0x2b0
   inet_diag_handler_cmd+0xcd/0x100
   ? __pfx_inet_diag_dump_start+0x10/0x10
   ? __pfx_inet_diag_dump+0x10/0x10
   ? __pfx_inet_diag_dump_done+0x10/0x10
   sock_diag_rcv_msg+0x18e/0x320
   ? __pfx_sock_diag_rcv_msg+0x10/0x10
   netlink_rcv_skb+0x4d/0x100
   netlink_unicast+0x1d7/0x2b0
   netlink_sendmsg+0x203/0x450
   ____sys_sendmsg+0x30c/0x340
   ___sys_sendmsg+0x94/0xf0
   __sys_sendmsg+0x83/0xf0
   do_syscall_64+0xbb/0x390
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   ...
   </TASK>

Fixes: 8f840e4 ("sctp: add the sctp_diag.c file")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251028161506.3294376-2-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Dec 17, 2025
commit 3ebcd34 upstream.

A process might fail to allocate a new bitmap when trying to expand its
proc->dmap. In that case, dbitmap_grow() fails and frees the old bitmap
via dbitmap_free(). However, the driver calls dbitmap_free() again when
the same process terminates, leading to a double-free error:

  ==================================================================
  BUG: KASAN: double-free in binder_proc_dec_tmpref+0x2e0/0x55c
  Free of addr ffff00000b7c1420 by task kworker/9:1/209

  CPU: 9 UID: 0 PID: 209 Comm: kworker/9:1 Not tainted 6.17.0-rc6-dirty #5 PREEMPT
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events binder_deferred_func
  Call trace:
   kfree+0x164/0x31c
   binder_proc_dec_tmpref+0x2e0/0x55c
   binder_deferred_func+0xc24/0x1120
   process_one_work+0x520/0xba4
  [...]

  Allocated by task 448:
   __kmalloc_noprof+0x178/0x3c0
   bitmap_zalloc+0x24/0x30
   binder_open+0x14c/0xc10
  [...]

  Freed by task 449:
   kfree+0x184/0x31c
   binder_inc_ref_for_node+0xb44/0xe44
   binder_transaction+0x29b4/0x7fbc
   binder_thread_write+0x1708/0x442c
   binder_ioctl+0x1b50/0x2900
  [...]
  ==================================================================

Fix this issue by marking proc->map NULL in dbitmap_free().

Cc: stable@vger.kernel.org
Fixes: 15d9da3 ("binder: use bitmap for faster descriptor lookup")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Tiffany Yang <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250915221248.3470154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
otavio pushed a commit that referenced this pull request Dec 17, 2025
[ Upstream commit 48918ca ]

The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    #6 0x565358f6682e in run_test_child builtin-test.c:344
    #7 0x565358ef7121 in start_command run-command.c:128
    #8 0x565358f67273 in start_test builtin-test.c:545
    #9 0x565358f6771d in __cmd_test builtin-test.c:647
    #10 0x565358f682bd in cmd_test builtin-test.c:849
    #11 0x565358ee5ded in run_builtin perf.c:349
    #12 0x565358ee6085 in handle_internal_command perf.c:401
    #13 0x565358ee61de in run_argv perf.c:448
    #14 0x565358ee6527 in main perf.c:555
    #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
otavio pushed a commit that referenced this pull request Dec 17, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  #7 [3800313fd60] crw_collect_info at c91905822
  #8 [3800313fe10] kthread at c90feb390
  #9 [3800313fe68] __ret_from_fork at c90f6aa64
  #10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.