Skip to content

Conversation

@gusenkovs
Copy link

FSM Subsystem for linux kernel.

http://fsmos.ru/en

rogerq pushed a commit to rogerq/linux that referenced this pull request Jan 9, 2017
… recovery

The keystone remoteproc driver performs an error recovery by scheduling
a workqueue from the keystone_remoteproc_exception_interrupt() handler
when using in-kernel remoteproc core loader/boot mechanism. This interrupt
is registered with IRQF_ONESHOT at the moment, and it results in a
"scheduling while atomic" BUG when running on RT-Linux. Oneshot interrupts
keep the irq line masked until the threaded handler has finished, and
the workqueue scheduling uses spinlocks for synchronization which get
transformed to rt_mutexes on RT. So, fix this by not using IRQF_ONESHOT
while requesting the interrupt. This interrupt is processed by UIO
framework when using the userspace based load/boot mechanism, and
doesn't need any changes in that path.

 remoteproc0: crash detected in 10800000.dsp0: type device exception
 BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
 in_atomic(): 1, irqs_disabled(): 128, pid: 53, name: irq/66-soc:keys
 1 lock held by irq/66-soc:keys/53:
 #0:  (&kirq->wa_lock){......}, at: [<c031773c>] keystone_irq_handler+0xec/0x240
 irq event stamp: 170018
 hardirqs last  enabled at (170017): [<c06b2920>] _raw_spin_unlock_irqrestore+0x78/0x80
 hardirqs last disabled at (170018): [<c06b2734>] _raw_spin_lock_irqsave+0x1c/0x58
 softirqs last  enabled at (0): [<c0023664>] copy_process+0x2bc/0x1678
 softirqs last disabled at (0): [<  (null)>]   (null)
 Preemption disabled at:[<  (null)>]   (null)

 CPU: 0 PID: 53 Comm: irq/66-soc:keys Tainted: G        W       4.4.36-rt43-03400-gba94d7c1a7fa torvalds#331
 Hardware name: Keystone
 [<c0017568>] (unwind_backtrace) from [<c00139e0>] (show_stack+0x10/0x14)
 [<c00139e0>] (show_stack) from [<c02e4c00>] (dump_stack+0x98/0xc4)
 [<c02e4c00>] (dump_stack) from [<c06b2ca0>] (rt_spin_lock+0x24/0x5c)
 [<c06b2ca0>] (rt_spin_lock) from [<c003b254>] (queue_work_on+0x60/0x194)
 [<c003b254>] (queue_work_on) from [<bf04e488>] (keystone_rproc_exception_interrupt+0x10/0x18 [keystone_remoteproc])
 [<bf04e488>] (keystone_rproc_exception_interrupt [keystone_remoteproc]) from [<c0081ff8>] (handle_irq_event_percpu+0x8c/0x178)
 [<c0081ff8>] (handle_irq_event_percpu) from [<c008211c>] (handle_irq_event+0x38/0x5c)
 [<c008211c>] (handle_irq_event) from [<c0085384>] (handle_level_irq+0xc4/0x168)
 [<c0085384>] (handle_level_irq) from [<c0081654>] (generic_handle_irq+0x24/0x34)
 [<c0081654>] (generic_handle_irq) from [<c0317748>] (keystone_irq_handler+0xf8/0x240)
 [<c0317748>] (keystone_irq_handler) from [<c00830f8>] (irq_forced_thread_fn+0x20/0x74)
 [<c00830f8>] (irq_forced_thread_fn) from [<c0083470>] (irq_thread+0x15c/0x230)
 [<c0083470>] (irq_thread) from [<c0044898>] (kthread+0xf0/0x108)
 [<c0044898>] (kthread) from [<c00102d0>] (ret_from_fork+0x14/0x24)
 remoteproc0: handling crash #1 in 10800000.dsp0!!
 remoteproc0: recovering 10800000.dsp0
 remoteproc0: stopped remote processor 10800000.dsp0
 remoteproc0: powering up 10800000.dsp0
 remoteproc0: Booting fw image keystone-dsp0-fw, size 3704928
 remoteproc0: remote processor 10800000.dsp0 is now up
 virtio_rpmsg_bus virtio0: rpmsg host is online
 virtio_rpmsg_bus virtio0: creating channel rpmsg-proto addr 0x3d
 remoteproc0: registered virtio0 (type 7)

Signed-off-by: Suman Anna <s-anna@ti.com>
laijs added a commit to laijs/linux that referenced this pull request Feb 16, 2017
Fix torvalds#331

test_getdents64() doesn't test the return value of snprintf(),
it ends up stackoverflow when it continues to do snprintf().

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
tobetter pushed a commit to tobetter/linux that referenced this pull request Dec 23, 2017
Netconsole support for XU4's network card
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 10, 2019
Inside print_request(), we query the context/timeline name. Nothing
immediately protects the context from being freed if the request is
complete -- we rely on serialisation by the caller to keep the name
valid until they finish using it. Inside intel_engine_dump(), we
generally only print the requsts in the execution queue protected by the
engine->active.lock, but we also show the pending execlists ports which
are not protected and so require an rcu_read_lock to keep the pointer
valid.

[ 1695.700883] BUG: KASAN: use-after-free in i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.700981] Read of size 8 at addr ffff8887344f4d50 by task gem_ctx_persist/2968
[ 1695.701068]
[ 1695.701156] CPU: 1 PID: 2968 Comm: gem_ctx_persist Tainted: G     U            5.4.0-rc6+ torvalds#331
[ 1695.701246] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 1695.701334] Call Trace:
[ 1695.701424]  dump_stack+0x5b/0x90
[ 1695.701870]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.701964]  print_address_description.constprop.7+0x36/0x50
[ 1695.702408]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.702856]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.702947]  __kasan_report.cold.10+0x1a/0x3a
[ 1695.703390]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.703836]  i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.704241]  print_request+0x82/0x2e0 [i915]
[ 1695.704638]  ? fwtable_read32+0x133/0x360 [i915]
[ 1695.705042]  ? write_timestamp+0x110/0x110 [i915]
[ 1695.705133]  ? _raw_spin_lock_irqsave+0x79/0xc0
[ 1695.705221]  ? refcount_inc_not_zero_checked+0x91/0x110
[ 1695.705306]  ? refcount_dec_and_mutex_lock+0x50/0x50
[ 1695.705709]  ? intel_engine_find_active_request+0x202/0x230 [i915]
[ 1695.706115]  intel_engine_dump+0x2c9/0x900 [i915]

Fixes: c36eebd ("drm/i915/gt: execlists->active is serialised by the tasklet")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 12, 2019
Inside print_request(), we query the context/timeline name. Nothing
immediately protects the context from being freed if the request is
complete -- we rely on serialisation by the caller to keep the name
valid until they finish using it. Inside intel_engine_dump(), we
generally only print the requests in the execution queue protected by the
engine->active.lock, but we also show the pending execlists ports which
are not protected and so require a rcu_read_lock to keep the pointer
valid.

[ 1695.700883] BUG: KASAN: use-after-free in i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.700981] Read of size 8 at addr ffff8887344f4d50 by task gem_ctx_persist/2968
[ 1695.701068]
[ 1695.701156] CPU: 1 PID: 2968 Comm: gem_ctx_persist Tainted: G     U            5.4.0-rc6+ torvalds#331
[ 1695.701246] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 1695.701334] Call Trace:
[ 1695.701424]  dump_stack+0x5b/0x90
[ 1695.701870]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.701964]  print_address_description.constprop.7+0x36/0x50
[ 1695.702408]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.702856]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.702947]  __kasan_report.cold.10+0x1a/0x3a
[ 1695.703390]  ? i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.703836]  i915_fence_get_timeline_name+0x53/0x90 [i915]
[ 1695.704241]  print_request+0x82/0x2e0 [i915]
[ 1695.704638]  ? fwtable_read32+0x133/0x360 [i915]
[ 1695.705042]  ? write_timestamp+0x110/0x110 [i915]
[ 1695.705133]  ? _raw_spin_lock_irqsave+0x79/0xc0
[ 1695.705221]  ? refcount_inc_not_zero_checked+0x91/0x110
[ 1695.705306]  ? refcount_dec_and_mutex_lock+0x50/0x50
[ 1695.705709]  ? intel_engine_find_active_request+0x202/0x230 [i915]
[ 1695.706115]  intel_engine_dump+0x2c9/0x900 [i915]

Fixes: c36eebd ("drm/i915/gt: execlists->active is serialised by the tasklet")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111114323.5833-1-chris@chris-wilson.co.uk
(cherry picked from commit fecffa4)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
@torvalds torvalds closed this Sep 22, 2025
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 24, 2025
I've been chasing down the following flaky splat, introduced by recent
changes in BTF generation [1]:

  ------------[ cut here ]------------
  BUG: unable to handle page fault for address: ffa000000233d828
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G        W  OE       6.19.0-rc1-gf785a31395d9 torvalds#331 PREEMPT(full)
  Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
  RIP: 0010:simplify_symbols+0x2b2/0x480
     9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
  RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
  RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
  RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
  RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
  R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
  R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
  FS:  00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __kmalloc_node_track_caller_noprof+0x37f/0x740
   ? __pfx_setup_modinfo_srcversion+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? kstrdup+0x4a/0x70
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? setup_modinfo_srcversion+0x1a/0x30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? setup_modinfo+0x12b/0x1e0
   load_module+0x133a/0x1610
   __x64_sys_finit_module+0x31b/0x450
   ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
   do_syscall_64+0x80/0x2d0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x95/0xc0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f1c63a2582d
     9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
  RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
  RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
  R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
   </TASK>
  Modules linked in: bpf_testmod(OE)
  CR2: ffa000000233d828
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:simplify_symbols+0x2b2/0x480
     9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
  RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
  RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
  RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
  RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
  R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
  R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
  FS:  00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
  PKRU: 55555554
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

This hasn't happened on BPF CI so far, for example, however I was able
to reproduce it on a particular x64 machine using a kernel built with
LLVM 20.

The crash happens on attempt to load one of the BPF selftest modules
(tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
is used by kfunc_module_order test.

The reason for the crash is that simplify_symbols() doesn't check for
bounds of the ELF section index:

       for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
		const char *name = info->strtab + sym[i].st_name;

		switch (sym[i].st_shndx) {
		case SHN_COMMON:

		[...]

		default:
			/* Divert to percpu allocation if a percpu var. */
			if (sym[i].st_shndx == info->index.pcpu)
				secbase = (unsigned long)mod_percpu(mod);
			else
  /** HERE --> **/		secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
			sym[i].st_value += secbase;
			break;
		}
	}

And in the case I was able to reproduce, the value 0xffff
(SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.

Now this code fragment is between 15 and 20 years old, so obviously
it's not expected for a kmodule symbol to have such st_shndx
value. Even so, the kernel probably should fail loading the module
instead of crashing, which is what this patch attempts to fix.

Investigating further, I discovered that the module binary became
corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
section data in scripts/gen-btf.sh. This explains how the bug has
surfaced after gen-btf.sh was introduced:

  $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
  llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table
  llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table
  llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table
       3: 0000000000000000    16 NOTYPE  LOCAL  DEFAULT   RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
  llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table
       4: 0000000000000008     4 OBJECT  LOCAL  DEFAULT   RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417

vs expected

  $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
       3: 0000000000000000    16 NOTYPE  LOCAL  DEFAULT     6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
       4: 0000000000000008     4 OBJECT  LOCAL  DEFAULT     6 __BTF_ID__func__bpf_test_modorder_retx__44417

But why? Updating section data without changing it's size is not
supposed to affect sections indices, right?

With a bit more testing I confirmed that this is a LLVM-specific
issue (doesn't reproduce with GCC kbuild), and it's not stable,
because in link-vmlinux.h we also do:

    ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}

However:

  $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff
  # no output, which is good

So the suspect is the implementation of llvm-objcopy. As it turns out
there is a relevant known bug that explains the flakiness and isn't
fixed yet [3].

[1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
[2] https://man7.org/linux/man-pages/man5/elf.5.html
[3] llvm/llvm-project#168060 (comment)

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants