FSM Client and Server #331

gusenkovs · 2016-10-08T15:19:28Z

FSM Subsystem for linux kernel.

… recovery The keystone remoteproc driver performs an error recovery by scheduling a workqueue from the keystone_remoteproc_exception_interrupt() handler when using in-kernel remoteproc core loader/boot mechanism. This interrupt is registered with IRQF_ONESHOT at the moment, and it results in a "scheduling while atomic" BUG when running on RT-Linux. Oneshot interrupts keep the irq line masked until the threaded handler has finished, and the workqueue scheduling uses spinlocks for synchronization which get transformed to rt_mutexes on RT. So, fix this by not using IRQF_ONESHOT while requesting the interrupt. This interrupt is processed by UIO framework when using the userspace based load/boot mechanism, and doesn't need any changes in that path. remoteproc0: crash detected in 10800000.dsp0: type device exception BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931 in_atomic(): 1, irqs_disabled(): 128, pid: 53, name: irq/66-soc:keys 1 lock held by irq/66-soc:keys/53: #0: (&kirq->wa_lock){......}, at: [<c031773c>] keystone_irq_handler+0xec/0x240 irq event stamp: 170018 hardirqs last enabled at (170017): [<c06b2920>] _raw_spin_unlock_irqrestore+0x78/0x80 hardirqs last disabled at (170018): [<c06b2734>] _raw_spin_lock_irqsave+0x1c/0x58 softirqs last enabled at (0): [<c0023664>] copy_process+0x2bc/0x1678 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 0 PID: 53 Comm: irq/66-soc:keys Tainted: G W 4.4.36-rt43-03400-gba94d7c1a7fa torvalds#331 Hardware name: Keystone [<c0017568>] (unwind_backtrace) from [<c00139e0>] (show_stack+0x10/0x14) [<c00139e0>] (show_stack) from [<c02e4c00>] (dump_stack+0x98/0xc4) [<c02e4c00>] (dump_stack) from [<c06b2ca0>] (rt_spin_lock+0x24/0x5c) [<c06b2ca0>] (rt_spin_lock) from [<c003b254>] (queue_work_on+0x60/0x194) [<c003b254>] (queue_work_on) from [<bf04e488>] (keystone_rproc_exception_interrupt+0x10/0x18 [keystone_remoteproc]) [<bf04e488>] (keystone_rproc_exception_interrupt [keystone_remoteproc]) from [<c0081ff8>] (handle_irq_event_percpu+0x8c/0x178) [<c0081ff8>] (handle_irq_event_percpu) from [<c008211c>] (handle_irq_event+0x38/0x5c) [<c008211c>] (handle_irq_event) from [<c0085384>] (handle_level_irq+0xc4/0x168) [<c0085384>] (handle_level_irq) from [<c0081654>] (generic_handle_irq+0x24/0x34) [<c0081654>] (generic_handle_irq) from [<c0317748>] (keystone_irq_handler+0xf8/0x240) [<c0317748>] (keystone_irq_handler) from [<c00830f8>] (irq_forced_thread_fn+0x20/0x74) [<c00830f8>] (irq_forced_thread_fn) from [<c0083470>] (irq_thread+0x15c/0x230) [<c0083470>] (irq_thread) from [<c0044898>] (kthread+0xf0/0x108) [<c0044898>] (kthread) from [<c00102d0>] (ret_from_fork+0x14/0x24) remoteproc0: handling crash #1 in 10800000.dsp0!! remoteproc0: recovering 10800000.dsp0 remoteproc0: stopped remote processor 10800000.dsp0 remoteproc0: powering up 10800000.dsp0 remoteproc0: Booting fw image keystone-dsp0-fw, size 3704928 remoteproc0: remote processor 10800000.dsp0 is now up virtio_rpmsg_bus virtio0: rpmsg host is online virtio_rpmsg_bus virtio0: creating channel rpmsg-proto addr 0x3d remoteproc0: registered virtio0 (type 7) Signed-off-by: Suman Anna <s-anna@ti.com>

Fix torvalds#331 test_getdents64() doesn't test the return value of snprintf(), it ends up stackoverflow when it continues to do snprintf(). Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>

Netconsole support for XU4's network card

Inside print_request(), we query the context/timeline name. Nothing immediately protects the context from being freed if the request is complete -- we rely on serialisation by the caller to keep the name valid until they finish using it. Inside intel_engine_dump(), we generally only print the requsts in the execution queue protected by the engine->active.lock, but we also show the pending execlists ports which are not protected and so require an rcu_read_lock to keep the pointer valid. [ 1695.700883] BUG: KASAN: use-after-free in i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.700981] Read of size 8 at addr ffff8887344f4d50 by task gem_ctx_persist/2968 [ 1695.701068] [ 1695.701156] CPU: 1 PID: 2968 Comm: gem_ctx_persist Tainted: G U 5.4.0-rc6+ torvalds#331 [ 1695.701246] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 1695.701334] Call Trace: [ 1695.701424] dump_stack+0x5b/0x90 [ 1695.701870] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.701964] print_address_description.constprop.7+0x36/0x50 [ 1695.702408] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.702856] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.702947] __kasan_report.cold.10+0x1a/0x3a [ 1695.703390] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.703836] i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.704241] print_request+0x82/0x2e0 [i915] [ 1695.704638] ? fwtable_read32+0x133/0x360 [i915] [ 1695.705042] ? write_timestamp+0x110/0x110 [i915] [ 1695.705133] ? _raw_spin_lock_irqsave+0x79/0xc0 [ 1695.705221] ? refcount_inc_not_zero_checked+0x91/0x110 [ 1695.705306] ? refcount_dec_and_mutex_lock+0x50/0x50 [ 1695.705709] ? intel_engine_find_active_request+0x202/0x230 [i915] [ 1695.706115] intel_engine_dump+0x2c9/0x900 [i915] Fixes: c36eebd ("drm/i915/gt: execlists->active is serialised by the tasklet") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Inside print_request(), we query the context/timeline name. Nothing immediately protects the context from being freed if the request is complete -- we rely on serialisation by the caller to keep the name valid until they finish using it. Inside intel_engine_dump(), we generally only print the requests in the execution queue protected by the engine->active.lock, but we also show the pending execlists ports which are not protected and so require a rcu_read_lock to keep the pointer valid. [ 1695.700883] BUG: KASAN: use-after-free in i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.700981] Read of size 8 at addr ffff8887344f4d50 by task gem_ctx_persist/2968 [ 1695.701068] [ 1695.701156] CPU: 1 PID: 2968 Comm: gem_ctx_persist Tainted: G U 5.4.0-rc6+ torvalds#331 [ 1695.701246] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 1695.701334] Call Trace: [ 1695.701424] dump_stack+0x5b/0x90 [ 1695.701870] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.701964] print_address_description.constprop.7+0x36/0x50 [ 1695.702408] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.702856] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.702947] __kasan_report.cold.10+0x1a/0x3a [ 1695.703390] ? i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.703836] i915_fence_get_timeline_name+0x53/0x90 [i915] [ 1695.704241] print_request+0x82/0x2e0 [i915] [ 1695.704638] ? fwtable_read32+0x133/0x360 [i915] [ 1695.705042] ? write_timestamp+0x110/0x110 [i915] [ 1695.705133] ? _raw_spin_lock_irqsave+0x79/0xc0 [ 1695.705221] ? refcount_inc_not_zero_checked+0x91/0x110 [ 1695.705306] ? refcount_dec_and_mutex_lock+0x50/0x50 [ 1695.705709] ? intel_engine_find_active_request+0x202/0x230 [i915] [ 1695.706115] intel_engine_dump+0x2c9/0x900 [i915] Fixes: c36eebd ("drm/i915/gt: execlists->active is serialised by the tasklet") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111114323.5833-1-chris@chris-wilson.co.uk (cherry picked from commit fecffa4) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I've been chasing down the following flaky splat, introduced by recent changes in BTF generation [1]: ------------[ cut here ]------------ BUG: unable to handle page fault for address: ffa000000233d828 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100253067 PUD 100258067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W OE 6.19.0-rc1-gf785a31395d9 torvalds#331 PREEMPT(full) Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014 RIP: 0010:simplify_symbols+0x2b2/0x480 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5 RSP: 0018:ffa00000017afc40 EFLAGS: 00010216 RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858 RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069 R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577 R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518 FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: <TASK> ? __kmalloc_node_track_caller_noprof+0x37f/0x740 ? __pfx_setup_modinfo_srcversion+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? kstrdup+0x4a/0x70 ? srso_alias_return_thunk+0x5/0xfbef5 ? setup_modinfo_srcversion+0x1a/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? setup_modinfo+0x12b/0x1e0 load_module+0x133a/0x1610 __x64_sys_finit_module+0x31b/0x450 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e do_syscall_64+0x80/0x2d0 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x95/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f1c63a2582d 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016 RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588 R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000 </TASK> Modules linked in: bpf_testmod(OE) CR2: ffa000000233d828 ---[ end trace 0000000000000000 ]--- RIP: 0010:simplify_symbols+0x2b2/0x480 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5 RSP: 0018:ffa00000017afc40 EFLAGS: 00010216 RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858 RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069 R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577 R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518 FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This hasn't happened on BPF CI so far, for example, however I was able to reproduce it on a particular x64 machine using a kernel built with LLVM 20. The crash happens on attempt to load one of the BPF selftest modules (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which is used by kfunc_module_order test. The reason for the crash is that simplify_symbols() doesn't check for bounds of the ELF section index: for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { const char *name = info->strtab + sym[i].st_name; switch (sym[i].st_shndx) { case SHN_COMMON: [...] default: /* Divert to percpu allocation if a percpu var. */ if (sym[i].st_shndx == info->index.pcpu) secbase = (unsigned long)mod_percpu(mod); else /** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr; sym[i].st_value += secbase; break; } } And in the case I was able to reproduce, the value 0xffff (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here. Now this code fragment is between 15 and 20 years old, so obviously it's not expected for a kmodule symbol to have such st_shndx value. Even so, the kernel probably should fail loading the module instead of crashing, which is what this patch attempts to fix. Investigating further, I discovered that the module binary became corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids section data in scripts/gen-btf.sh. This explains how the bug has surfaced after gen-btf.sh was introduced: $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID' llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417 vs expected $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID' 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6 __BTF_ID__func__bpf_test_modorder_retx__44417 But why? Updating section data without changing it's size is not supposed to affect sections indices, right? With a bit more testing I confirmed that this is a LLVM-specific issue (doesn't reproduce with GCC kbuild), and it's not stable, because in link-vmlinux.h we also do: ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX} However: $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff # no output, which is good So the suspect is the implementation of llvm-objcopy. As it turns out there is a relevant known bug that explains the flakiness and isn't fixed yet [3]. [1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/ [2] https://man7.org/linux/man-pages/man5/elf.5.html [3] llvm/llvm-project#168060 (comment) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>

FSM Clien and Server Beta 1

0d0031e

tobetter pushed a commit to tobetter/linux that referenced this pull request Dec 23, 2017

Merge pull request torvalds#331 from mad-ady/odroidxu4-4.14.y

a98b3e4

Netconsole support for XU4's network card

torvalds closed this Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FSM Client and Server #331

FSM Client and Server #331

Uh oh!

gusenkovs commented Oct 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FSM Client and Server #331

FSM Client and Server #331

Uh oh!

Conversation

gusenkovs commented Oct 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants