-
Notifications
You must be signed in to change notification settings - Fork 579
Serializer Closing Fix #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
luctius
wants to merge
70
commits into
beagleboard:3.14
from
incas3:davinci-mcasp_closing_of_streams_fix
Closed
Serializer Closing Fix #15
luctius
wants to merge
70
commits into
beagleboard:3.14
from
incas3:davinci-mcasp_closing_of_streams_fix
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
#git://git.ti.com/ti-cm3-pm-firmware/amx3-cm3.git #4e219d5053ee41b8fa8f85b48b1529ae4c6feb48 Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
This patch was derived from 2 commits, but I removed the non-pinmux-helper portions
and added the helper to the bb.org_defconfig:
capemgr: Capemgr makefiles and Kconfig fragments.
Introduce a cape loader using DT overlays and dynamic
DT objects.
Makefile and Kconfig fragments.
Signed-off-by: Pantelis Antoniou <panto@antoniou-consulting.com>
Conflicts:
arch/arm/mach-omap2/Kconfig
drivers/misc/Kconfig
drivers/misc/Makefile
And:
Pinmux helper driver.
That's just a hack to get a pinmux helper driver working.
Define in the DT
helper {
compatible = "bone-pinmux-helper";
pinctrl-names = "default";
pinctrl-0 = <&helper_pins>;
status = "okay";
};
Pinctrl already supports multiple states. Just make them visible.
devm_kfree warned out... why? no idea.
A gpio OF helper driver that allows configuration to be done via DT.
Signed-off-by: Charles Steinkuehler <charles@steinkuehler.net>
Signed-off-by: Jason Kridner <jdk@ti.com>
This adds gpio and pinmux helpers to the majority of available expansion header pins based on the cape-universal work from Charles Steinkuehler making them userspace configurable. This is not a substitute for Capemgr as it doesn't perform the configuration based on cape detection, nor does it enable dynamic configuration of all types of peripherals that could be on a cape. It does, however, enable many developers to rapidly experiment with a lesser degree of complexity. Derived from: https://github.com/cdsteinkuehler/beaglebone-universal-io/blob/52461b52ef3203e648399c16c7e160c848a04b5c$ Signed-off-by: Jason Kridner <jdk@ti.com> Cc: Charles Steinkuehler <charles@steinkuehler.net> Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
…new mode parameter is used to set the initial pinmux mode to something other than "default" or NULL, which is what happens currently. This allows enabling SoC hardware via device-tree which requires specific pinmux settings to function on boot, but still leaves the pinmux register under control of the bone-pinmux- helper driver meaning the pinmux setting can be changed at run time via user-mode access to sysfs. Signed-off-by: Charles Steinkuehler <charles@steinkuehler.net>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
BeagleBone DTS : Enable run-time pinmux for HDMI Add cape-universal-hdmi pin info to am335x-bone-common-pinmux.dtsi Edit hdmi dtsi include files to use new mode= setting to set HDMI mode at startup, leaving pinmux configurable at runtime. Signed-off-by: Charles Steinkuehler <charles@steinkuehler.net> Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Adrian Remonda <adrianremonda@gmail.com> Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
(split into pinmux and cape patches rcn-ee) Signed-off-by: John Syn <john3909@gmail.com> Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
crow-misia
pushed a commit
to crow-misia/linux
that referenced
this pull request
Jun 4, 2019
[ Upstream commit 36a2ba0 ] In a system where, through IORT firmware mappings, the SMMU device is mapped to a NUMA node that is not online, the kernel bootstrap results in the following crash: Unable to handle kernel paging request at virtual address 0000000000001388 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [0000000000001388] user address but active_mm is swapper Internal error: Oops: 96000004 [beagleboard#1] SMP Modules linked in: CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 beagleboard#15 pstate: 80c00009 (Nzcv daif +PAN +UAO) pc : __alloc_pages_nodemask+0x13c/0x1068 lr : __alloc_pages_nodemask+0xdc/0x1068 ... Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) Call trace: __alloc_pages_nodemask+0x13c/0x1068 new_slab+0xec/0x570 ___slab_alloc+0x3e0/0x4f8 __slab_alloc+0x60/0x80 __kmalloc_node_track_caller+0x10c/0x478 devm_kmalloc+0x44/0xb0 pinctrl_bind_pins+0x4c/0x188 really_probe+0x78/0x2b8 driver_probe_device+0x64/0x110 device_driver_attach+0x74/0x98 __driver_attach+0x9c/0xe8 bus_for_each_dev+0x84/0xd8 driver_attach+0x30/0x40 bus_add_driver+0x170/0x218 driver_register+0x64/0x118 __platform_driver_register+0x54/0x60 arm_smmu_driver_init+0x24/0x2c do_one_initcall+0xbc/0x328 kernel_init_freeable+0x304/0x3ac kernel_init+0x18/0x110 ret_from_fork+0x10/0x1c Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804) ---[ end trace dfeaed4c373a32da ]-- Change the dev_set_proximity() hook prototype so that it returns a value and make it return failure if the PXM->NUMA-node mapping corresponds to an offline node, fixing the crash. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/ Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Sep 25, 2019
commit d0a255e upstream. A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Nov 4, 2019
commit 5f92427 upstream. The syzbot fuzzer found a general protection fault in the HID subsystem: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 3715 Comm: syz-executor.3 Not tainted 5.2.0-rc6+ #15 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__pm_runtime_resume+0x49/0x180 drivers/base/power/runtime.c:1069 Code: ed 74 d5 fe 45 85 ed 0f 85 9a 00 00 00 e8 6f 73 d5 fe 48 8d bd c1 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 fe 00 00 00 RSP: 0018:ffff8881d99d78e0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000020 RCX: ffffc90003f3f000 RDX: 0000000416d8686d RSI: ffffffff82676841 RDI: 00000020b6c3436a RBP: 00000020b6c340a9 R08: ffff8881c6d64800 R09: fffffbfff0e84c25 R10: ffff8881d99d7940 R11: ffffffff87426127 R12: 0000000000000004 R13: 0000000000000000 R14: ffff8881d9b94000 R15: ffffffff897f9048 FS: 00007f047f542700(0000) GS:ffff8881db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30f21000 CR3: 00000001ca032000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: pm_runtime_get_sync include/linux/pm_runtime.h:226 [inline] usb_autopm_get_interface+0x1b/0x50 drivers/usb/core/driver.c:1707 usbhid_power+0x7c/0xe0 drivers/hid/usbhid/hid-core.c:1234 hid_hw_power include/linux/hid.h:1038 [inline] hidraw_open+0x20d/0x740 drivers/hid/hidraw.c:282 chrdev_open+0x219/0x5c0 fs/char_dev.c:413 do_dentry_open+0x497/0x1040 fs/open.c:778 do_last fs/namei.c:3416 [inline] path_openat+0x1430/0x3ff0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3c0/0x580 fs/open.c:1070 do_syscall_64+0xb7/0x560 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe It turns out the fault was caused by a bug in the HID Logitech driver, which violates the requirement that every pathway calling hid_hw_start() must also call hid_hw_stop(). This patch fixes the bug by making sure the requirement is met. Reported-and-tested-by: syzbot+3cbe5cd105d2ad56a1df@syzkaller.appspotmail.com Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@vger.kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fhgwright
pushed a commit
to fhgwright/bb-linux
that referenced
this pull request
Jan 23, 2020
If FF request comes in while uinput device is going away, uinput_request_send() will fail with -ENODEV, and uinput_request_submit() will attempt to mark the slot as unused by calling uinput_request_done(). Unfortunately in this case we haven't initialized request->done completion yet, and we get a crash: [ 39.402036] BUG: spinlock bad magic on CPU#1, fftest/3108 [ 39.402046] lock: 0xffff88006a93bb00, .magic: 00000000, .owner: /39, .owner_cpu: 1217155072 [ 39.402055] CPU: 1 PID: 3108 Comm: fftest Tainted: G W 4.13.0+ beagleboard#15 [ 39.402059] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET37W (1.22 ) 07/04/2017 [ 39.402064] 0000000000000086 f0fad82f3ceaa120 ffff88006a93b9a0 ffffffff9de941bb [ 39.402077] ffff88026df8ae00 ffff88006a93bb00 ffff88006a93b9c0 ffffffff9dca62b7 [ 39.402088] ffff88006a93bb00 ffff88006a93baf8 ffff88006a93b9e0 ffffffff9dca62e7 [ 39.402099] Call Trace: [ 39.402112] [<ffffffff9de941bb>] dump_stack+0x4d/0x63 [ 39.402123] [<ffffffff9dca62b7>] spin_dump+0x97/0x9c [ 39.402130] [<ffffffff9dca62e7>] spin_bug+0x2b/0x2d [ 39.402138] [<ffffffff9dca6373>] do_raw_spin_lock+0x28/0xfd [ 39.402147] [<ffffffff9e3055cd>] _raw_spin_lock_irqsave+0x19/0x1f [ 39.402154] [<ffffffff9dca05b7>] complete+0x1d/0x48 [ 39.402162] [<ffffffffc04f30af>] 0xffffffffc04f30af [ 39.402167] [<ffffffffc04f468c>] 0xffffffffc04f468c [ 39.402177] [<ffffffff9dd59c16>] ? __slab_free+0x22f/0x359 [ 39.402184] [<ffffffff9dcc13e9>] ? tk_clock_read+0xc/0xe [ 39.402189] [<ffffffffc04f471f>] 0xffffffffc04f471f [ 39.402195] [<ffffffff9dc9ffe5>] ? __wake_up+0x44/0x4b [ 39.402200] [<ffffffffc04f3240>] ? 0xffffffffc04f3240 [ 39.402207] [<ffffffff9e0f57f3>] erase_effect+0xa1/0xd2 [ 39.402214] [<ffffffff9e0f58c6>] input_ff_flush+0x43/0x5c [ 39.402219] [<ffffffffc04f32ad>] 0xffffffffc04f32ad [ 39.402227] [<ffffffff9e0f174f>] input_flush_device+0x3d/0x51 [ 39.402234] [<ffffffff9e0f69ae>] evdev_flush+0x49/0x5c [ 39.402243] [<ffffffff9dd62d6e>] filp_close+0x3f/0x65 [ 39.402253] [<ffffffff9dd7dcf7>] put_files_struct+0x66/0xc1 [ 39.402261] [<ffffffff9dd7ddeb>] exit_files+0x47/0x4e [ 39.402270] [<ffffffff9dc6b329>] do_exit+0x483/0x969 [ 39.402278] [<ffffffff9dc73211>] ? recalc_sigpending_tsk+0x3d/0x44 [ 39.402285] [<ffffffff9dc6c7a2>] do_group_exit+0x42/0xb0 [ 39.402293] [<ffffffff9dc767e1>] get_signal+0x58d/0x5bf [ 39.402300] [<ffffffff9dc03701>] do_signal+0x37/0x53e [ 39.402307] [<ffffffff9e0f8401>] ? evdev_ioctl_handler+0xac8/0xb04 [ 39.402314] [<ffffffff9e0f8464>] ? evdev_ioctl+0x10/0x12 [ 39.402321] [<ffffffff9dd74cfa>] ? do_vfs_ioctl+0x42e/0x501 [ 39.402328] [<ffffffff9dc0170e>] prepare_exit_to_usermode+0x66/0x90 [ 39.402333] [<ffffffff9dc0181b>] syscall_return_slowpath+0xe3/0xec [ 39.402339] [<ffffffff9e305b7b>] int_ret_from_sys_call+0x25/0x8f While we could solve this by simply initializing the completion earlier, we are better off rearranging the code a bit so we avoid calling complete() on requests that we did not send out. This patch consolidates marking request slots as free in one place (in uinput_request_submit(), the same place where we acquire them) and having everyone else simply signal completion of the requests. Fixes: 00ce756 ("Input: uinput - mark failed submission requests as free") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
RobertCNelson
pushed a commit
that referenced
this pull request
May 9, 2020
[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d #17 [ffffc9002219fc90] schedule at ffffffff81a3e219 #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> #1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Jun 15, 2020
commit f8f482d upstream. When the kernel is built with lockdep support and the owl-dma driver is used, the following message is shown: [ 2.496939] INFO: trying to register non-static key. [ 2.501889] the code is fine but needs lockdep annotation. [ 2.507357] turning off the locking correctness validator. [ 2.512834] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.6.3+ #15 [ 2.519084] Hardware name: Generic DT based system [ 2.523878] Workqueue: events_freezable mmc_rescan [ 2.528681] [<801127f0>] (unwind_backtrace) from [<8010da58>] (show_stack+0x10/0x14) [ 2.536420] [<8010da58>] (show_stack) from [<8080fbe8>] (dump_stack+0xb4/0xe0) [ 2.543645] [<8080fbe8>] (dump_stack) from [<8017efa4>] (register_lock_class+0x6f0/0x718) [ 2.551816] [<8017efa4>] (register_lock_class) from [<8017b7d0>] (__lock_acquire+0x78/0x25f0) [ 2.560330] [<8017b7d0>] (__lock_acquire) from [<8017e5e4>] (lock_acquire+0xd8/0x1f4) [ 2.568159] [<8017e5e4>] (lock_acquire) from [<80831fb0>] (_raw_spin_lock_irqsave+0x3c/0x50) [ 2.576589] [<80831fb0>] (_raw_spin_lock_irqsave) from [<8051b5fc>] (owl_dma_issue_pending+0xbc/0x120) [ 2.585884] [<8051b5fc>] (owl_dma_issue_pending) from [<80668cbc>] (owl_mmc_request+0x1b0/0x390) [ 2.594655] [<80668cbc>] (owl_mmc_request) from [<80650ce0>] (mmc_start_request+0x94/0xbc) [ 2.602906] [<80650ce0>] (mmc_start_request) from [<80650ec0>] (mmc_wait_for_req+0x64/0xd0) [ 2.611245] [<80650ec0>] (mmc_wait_for_req) from [<8065aa10>] (mmc_app_send_scr+0x10c/0x144) [ 2.619669] [<8065aa10>] (mmc_app_send_scr) from [<80659b3c>] (mmc_sd_setup_card+0x4c/0x318) [ 2.628092] [<80659b3c>] (mmc_sd_setup_card) from [<80659f0c>] (mmc_sd_init_card+0x104/0x430) [ 2.636601] [<80659f0c>] (mmc_sd_init_card) from [<8065a3e0>] (mmc_attach_sd+0xcc/0x16c) [ 2.644678] [<8065a3e0>] (mmc_attach_sd) from [<8065301c>] (mmc_rescan+0x3ac/0x40c) [ 2.652332] [<8065301c>] (mmc_rescan) from [<80143244>] (process_one_work+0x2d8/0x780) [ 2.660239] [<80143244>] (process_one_work) from [<80143730>] (worker_thread+0x44/0x598) [ 2.668323] [<80143730>] (worker_thread) from [<8014b5f8>] (kthread+0x148/0x150) [ 2.675708] [<8014b5f8>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20) [ 2.682912] Exception stack(0xee8fdfb0 to 0xee8fdff8) [ 2.687954] dfa0: 00000000 00000000 00000000 00000000 [ 2.696118] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.704277] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 The obvious fix would be to use 'spin_lock_init()' on 'pchan->lock' before attempting to call 'spin_lock_irqsave()' in 'owl_dma_get_pchan()'. However, according to Manivannan Sadhasivam, 'pchan->lock' was supposed to only protect 'pchan->vchan' while 'od->lock' does a similar job in 'owl_dma_terminate_pchan()'. Therefore, this patch substitutes 'pchan->lock' with 'od->lock' and removes the 'lock' attribute in 'owl_dma_pchan' struct. Fixes: 47e2057 ("dmaengine: Add Actions Semi Owl family S900 DMA driver") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Andreas Färber <afaerber@suse.de> Link: https://lore.kernel.org/r/c6e6cdaca252b5364bd294093673951036488cf0.1588439073.git.cristian.ciocaltea@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Sep 25, 2020
…up_skb [ Upstream commit 394de11 ] The packets from tunnel devices (eg bareudp) may have only metadata in the dst pointer of skb. Hence a pointer check of neigh_lookup is needed in dst_neigh_lookup_skb Kernel crashes when packets from bareudp device is processed in the kernel neighbour subsytem. [ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 133.385240] #PF: supervisor instruction fetch in kernel mode [ 133.385828] #PF: error_code(0x0010) - not-present page [ 133.386603] PGD 0 P4D 0 [ 133.386875] Oops: 0010 [#1] SMP PTI [ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15 [ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 133.391076] RIP: 0010:0x0 [ 133.392401] Code: Bad RIP value. [ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246 [ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00 [ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400 [ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000 [ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400 [ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003 [ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000 [ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0 [ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 133.404933] Call Trace: [ 133.405169] <IRQ> [ 133.405367] __neigh_update+0x5a4/0x8f0 [ 133.405734] arp_process+0x294/0x820 [ 133.406076] ? __netif_receive_skb_core+0x866/0xe70 [ 133.406557] arp_rcv+0x129/0x1c0 [ 133.406882] __netif_receive_skb_one_core+0x95/0xb0 [ 133.407340] process_backlog+0xa7/0x150 [ 133.407705] net_rx_action+0x2af/0x420 [ 133.408457] __do_softirq+0xda/0x2a8 [ 133.408813] asm_call_on_stack+0x12/0x20 [ 133.409290] </IRQ> [ 133.409519] do_softirq_own_stack+0x39/0x50 [ 133.410036] do_softirq+0x50/0x60 [ 133.410401] __local_bh_enable_ip+0x50/0x60 [ 133.410871] ip_finish_output2+0x195/0x530 [ 133.411288] ip_output+0x72/0xf0 [ 133.411673] ? __ip_finish_output+0x1f0/0x1f0 [ 133.412122] ip_send_skb+0x15/0x40 [ 133.412471] raw_sendmsg+0x853/0xab0 [ 133.412855] ? insert_pfn+0xfe/0x270 [ 133.413827] ? vvar_fault+0xec/0x190 [ 133.414772] sock_sendmsg+0x57/0x80 [ 133.415685] __sys_sendto+0xdc/0x160 [ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0 [ 133.417679] ? __audit_syscall_exit+0x1d9/0x280 [ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0 [ 133.419819] __x64_sys_sendto+0x24/0x30 [ 133.420848] do_syscall_64+0x4d/0x90 [ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 133.422833] RIP: 0033:0x7fe013689c03 [ 133.423749] Code: Bad RIP value. [ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03 [ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003 [ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010 [ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080 [ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod [ 133.444045] CR2: 0000000000000000 [ 133.445082] ---[ end trace f4aeee1958fd1638 ]--- [ 133.446236] RIP: 0010:0x0 [ 133.447180] Code: Bad RIP value. [ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246 [ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00 [ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400 [ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000 [ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400 [ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003 [ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000 [ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0 [ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt [ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: aaa0c23 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Sep 25, 2020
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Oct 27, 2020
[ Upstream commit b12eea5 ] The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 #10 0x559fbbc0109b in run_test tests/builtin-test.c:410 #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
May 14, 2021
commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
May 14, 2021
commit 0bb7883 upstream. While removing a qgroup's sysfs entry we end up taking the kernfs_mutex, through kobject_del(), while holding the fs_info->qgroup_lock spinlock, producing the following trace: [821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman [821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G W 5.11.6 #15 [821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020 [821.843647] Call Trace: [821.843650] dump_stack+0xa1/0xfb [821.843656] ___might_sleep+0x144/0x160 [821.843659] mutex_lock+0x17/0x40 [821.843662] kernfs_remove_by_name_ns+0x1f/0x80 [821.843666] sysfs_remove_group+0x7d/0xe0 [821.843668] sysfs_remove_groups+0x28/0x40 [821.843670] kobject_del+0x2a/0x80 [821.843672] btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs] [821.843685] __del_qgroup_rb+0x12/0x150 [btrfs] [821.843696] btrfs_remove_qgroup+0x288/0x2a0 [btrfs] [821.843707] btrfs_ioctl+0x3129/0x36a0 [btrfs] [821.843717] ? __mod_lruvec_page_state+0x5e/0xb0 [821.843719] ? page_add_new_anon_rmap+0xbc/0x150 [821.843723] ? kfree+0x1b4/0x300 [821.843725] ? mntput_no_expire+0x55/0x330 [821.843728] __x64_sys_ioctl+0x5a/0xa0 [821.843731] do_syscall_64+0x33/0x70 [821.843733] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [821.843736] RIP: 0033:0x4cd3fb [821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb [821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f [821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000 [821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d [821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049 Fix this by removing the qgroup sysfs entry while not holding the spinlock, since the spinlock is only meant for protection of the qgroup rbtree. Reported-by: Stuart Shelton <srcshelton@gmail.com> Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/ Fixes: 49e5fb4 ("btrfs: qgroup: export qgroups in sysfs") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Oct 1, 2021
commit 41d5854 upstream. I got several memory leak reports from Asan with a simple command. It was because VDSO is not released due to the refcount. Like in __dsos_addnew_id(), it should put the refcount after adding to the list. $ perf record true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ] ================================================================= ==692599==ERROR: LeakSanitizer: detected memory leaks Direct leak of 439 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256 #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132 #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347 #4 0x559bce50826c in map__new util/map.c:175 #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 #6 0x559bce512f6b in machines__deliver_event util/session.c:1481 #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551 #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244 #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 #10 0x559bce519bea in __perf_session__process_events util/session.c:2268 #11 0x559bce519bea in perf_session__process_events util/session.c:2297 #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169 #2 0x559bce50821b in map__new util/map.c:168 #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 #4 0x559bce512f6b in machines__deliver_event util/session.c:1481 #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551 #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244 #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 #8 0x559bce519bea in __perf_session__process_events util/session.c:2268 #9 0x559bce519bea in perf_session__process_events util/session.c:2297 #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Dec 17, 2021
commit 57f0ff0 upstream. It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Jan 12, 2022
…elect() commit e0a2c28 upstream. In resp_mode_select() sanity check the block descriptor len to avoid UAF. BUG: KASAN: use-after-free in resp_mode_select+0xa4c/0xb40 drivers/scsi/scsi_debug.c:2509 Read of size 1 at addr ffff888026670f50 by task scsicmd/15032 CPU: 1 PID: 15032 Comm: scsicmd Not tainted 5.15.0-01d0625 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Call Trace: <TASK> dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:107 print_address_description.constprop.9+0x28/0x160 mm/kasan/report.c:257 kasan_report.cold.14+0x7d/0x117 mm/kasan/report.c:443 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report_generic.c:306 resp_mode_select+0xa4c/0xb40 drivers/scsi/scsi_debug.c:2509 schedule_resp+0x4af/0x1a10 drivers/scsi/scsi_debug.c:5483 scsi_debug_queuecommand+0x8c9/0x1e70 drivers/scsi/scsi_debug.c:7537 scsi_queue_rq+0x16b4/0x2d10 drivers/scsi/scsi_lib.c:1521 blk_mq_dispatch_rq_list+0xb9b/0x2700 block/blk-mq.c:1640 __blk_mq_sched_dispatch_requests+0x28f/0x590 block/blk-mq-sched.c:325 blk_mq_sched_dispatch_requests+0x105/0x190 block/blk-mq-sched.c:358 __blk_mq_run_hw_queue+0xe5/0x150 block/blk-mq.c:1762 __blk_mq_delay_run_hw_queue+0x4f8/0x5c0 block/blk-mq.c:1839 blk_mq_run_hw_queue+0x18d/0x350 block/blk-mq.c:1891 blk_mq_sched_insert_request+0x3db/0x4e0 block/blk-mq-sched.c:474 blk_execute_rq_nowait+0x16b/0x1c0 block/blk-exec.c:63 sg_common_write.isra.18+0xeb3/0x2000 drivers/scsi/sg.c:837 sg_new_write.isra.19+0x570/0x8c0 drivers/scsi/sg.c:775 sg_ioctl_common+0x14d6/0x2710 drivers/scsi/sg.c:941 sg_ioctl+0xa2/0x180 drivers/scsi/sg.c:1166 __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:52 do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:50 entry_SYSCALL_64_after_hwframe+0x44/0xae arch/x86/entry/entry_64.S:113 Link: https://lore.kernel.org/r/1637262208-28850-1-git-send-email-george.kennedy@oracle.com Reported-by: syzkaller <syzkaller@googlegroups.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Mar 31, 2022
[ Upstream commit f53a2ce ] As explained in commits: 74b6d7d ("net: dsa: realtek: register the MDIO bus under devres") 5135e96 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The mv88e6xxx is an MDIO device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the Marvell switch driver on shutdown. systemd-shutdown[1]: Powering off. mv88e6085 0x0000000008b96000:00 sw_gl0: Link is Down fsl-mc dpbp.9: Removing from iommu group 7 fsl-mc dpbp.8: Removing from iommu group 7 ------------[ cut here ]------------ kernel BUG at drivers/net/phy/mdio_bus.c:677! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00040-gdc05f73788e5 #15 pc : mdiobus_free+0x44/0x50 lr : devm_mdiobus_free+0x10/0x20 Call trace: mdiobus_free+0x44/0x50 devm_mdiobus_free+0x10/0x20 devres_release_all+0xa0/0x100 __device_release_driver+0x190/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x4c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 kernel_power_off+0x34/0x6c __do_sys_reboot+0x15c/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The Marvell driver already has a good structure for mdiobus removal, so just plug in mdiobus_free and get rid of devres. Fixes: ac3a68d ("net: phy: don't abuse devres in devm_mdiobus_register()") Reported-by: Rafael Richter <Rafael.Richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Daniel Klauer <daniel.klauer@gin.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Mar 31, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Jun 28, 2022
commit 6d5aa41 upstream. The reference to `explicit_in_reply_to` is pointless as when the reference was added in the form of "#15" [1], Section 15) was "The canonical patch format". The reference of "#15" had not been properly updated in a couple of reorganizations during the plain-text SubmittingPatches era. Fix it by using `the_canonical_patch_format`. [1]: 2ae19ac ("Documentation: Add "how to write a good patch summary" to SubmittingPatches") Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Fixes: 5903019 ("Documentation/SubmittingPatches: convert it to ReST markup") Fixes: 9b2c767 ("Documentation/SubmittingPatches: enrich the Sphinx output") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: stable@vger.kernel.org # v4.9+ Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Aug 26, 2022
commit 050133e upstream. commit 0622cab ("bonding: fix 802.3ad aggregator reselection"), resolve case, when there is several aggregation groups in the same bond. bond_3ad_unbind_slave will invalidate (clear) aggregator when __agg_active_ports return zero. So, ad_clear_agg can be executed even, when num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for, previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave will not update slave ports list, because lag_ports==NULL. So, here we got slave ports, pointing to freed aggregator memory. Fix with checking actual number of ports in group (as was before commit 0622cab ("bonding: fix 802.3ad aggregator reselection") ), before ad_clear_agg(). The KASAN logs are as follows: [ 767.617392] ================================================================== [ 767.630776] BUG: KASAN: use-after-free in bond_3ad_state_machine_handler+0x13dc/0x1470 [ 767.638764] Read of size 2 at addr ffff00011ba9d430 by task kworker/u8:7/767 [ 767.647361] CPU: 3 PID: 767 Comm: kworker/u8:7 Tainted: G O 5.15.11 #15 [ 767.655329] Hardware name: DNI AmazonGo1 A7040 board (DT) [ 767.660760] Workqueue: lacp_1 bond_3ad_state_machine_handler [ 767.666468] Call trace: [ 767.668930] dump_backtrace+0x0/0x2d0 [ 767.672625] show_stack+0x24/0x30 [ 767.675965] dump_stack_lvl+0x68/0x84 [ 767.679659] print_address_description.constprop.0+0x74/0x2b8 [ 767.685451] kasan_report+0x1f0/0x260 [ 767.689148] __asan_load2+0x94/0xd0 [ 767.692667] bond_3ad_state_machine_handler+0x13dc/0x1470 Fixes: 0622cab ("bonding: fix 802.3ad aggregator reselection") Co-developed-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20220629012914.361-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Dec 8, 2022
commit 4bb26f2 upstream. When inode is created and written to using direct IO, there is nothing to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets truncated later to say 1 byte and written using normal write, we will try to store the data as inline data. This confuses the code later because the inode now has both normal block and inline data allocated and the confusion manifests for example as: kernel BUG at fs/ext4/inode.c:2721! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:ext4_writepages+0x363d/0x3660 RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293 RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180 RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000 RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128 R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001 FS: 00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0 Call Trace: <TASK> do_writepages+0x397/0x640 filemap_fdatawrite_wbc+0x151/0x1b0 file_write_and_wait_range+0x1c9/0x2b0 ext4_sync_file+0x19e/0xa00 vfs_fsync_range+0x17b/0x190 ext4_buffered_write_iter+0x488/0x530 ext4_file_write_iter+0x449/0x1b90 vfs_write+0xbcd/0xf40 ksys_write+0x198/0x2c0 __x64_sys_write+0x7b/0x90 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing direct IO write to a file. Cc: stable@kernel.org Reported-by: Tadeusz Struk <tadeusz.struk@linaro.org> Reported-by: syzbot+bd13648a53ed6933ca49@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Tested-by: Tadeusz Struk<tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220727155753.13969-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nmenon
pushed a commit
to nmenon/nm-beagle-linux
that referenced
this pull request
May 22, 2023
[ Upstream commit 93c660c ] ASAN reports an use-after-free in btf_dump_name_dups: ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928 READ of size 2 at 0xffff927006db thread T0 #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614) beagleboard#1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127 beagleboard#2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143 beagleboard#3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212 beagleboard#4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525 beagleboard#5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552 beagleboard#6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567 beagleboard#7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912 beagleboard#8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798 beagleboard#9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282 beagleboard#10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236 beagleboard#11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 beagleboard#12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 beagleboard#13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 beagleboard#14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 beagleboard#15 0xaaaab5d65990 (test_progs+0x185990) 0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0) freed by thread T0 here: #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4) beagleboard#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191 beagleboard#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163 beagleboard#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106 beagleboard#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157 beagleboard#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519 beagleboard#6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032 beagleboard#7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232 beagleboard#8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 beagleboard#9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 beagleboard#10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 beagleboard#11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 beagleboard#12 0xaaaab5d65990 (test_progs+0x185990) previously allocated by thread T0 here: #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4) beagleboard#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191 beagleboard#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163 beagleboard#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106 beagleboard#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157 beagleboard#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519 beagleboard#6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070 beagleboard#7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102 beagleboard#8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162 beagleboard#9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 beagleboard#10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 beagleboard#11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 beagleboard#12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 beagleboard#13 0xaaaab5d65990 (test_progs+0x185990) The reason is that the key stored in hash table name_map is a string address, and the string memory is allocated by realloc() function, when the memory is resized by realloc() later, the old memory may be freed, so the address stored in name_map references to a freed memory, causing use-after-free. Fix it by storing duplicated string address in name_map. Fixes: 919d2b1 ("libbpf: Allow modification of BTF and add btf__add_str API") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20221011120108.782373-2-xukuohai@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
nmenon
pushed a commit
to nmenon/nm-beagle-linux
that referenced
this pull request
May 22, 2023
…g the sock [ Upstream commit 3cf7203 ] There is a race condition in vxlan that when deleting a vxlan device during receiving packets, there is a possibility that the sock is released after getting vxlan_sock vs from sk_user_data. Then in later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got NULL pointer dereference. e.g. #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 beagleboard#1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d beagleboard#2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 beagleboard#3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b beagleboard#4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb beagleboard#5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 beagleboard#6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 [exception RIP: vxlan_ecn_decapsulate+0x3b] RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 beagleboard#7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] beagleboard#8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 beagleboard#9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 beagleboard#10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 beagleboard#11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 beagleboard#12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde beagleboard#13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b beagleboard#14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 beagleboard#15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a beagleboard#16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 beagleboard#17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca beagleboard#18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh Fix this by waiting for all sk_user_data reader to finish before releasing the sock. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
nmenon
pushed a commit
to nmenon/nm-beagle-linux
that referenced
this pull request
May 22, 2023
[ Upstream commit caa4b35 ] If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that res.class contains a valid pointer Sample splat reported by Kyle Zeng [ 5.405624] 0: reclassify loop, rule prio 0, protocol 800 [ 5.406326] ================================================================== [ 5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0 [ 5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299 [ 5.408731] [ 5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ beagleboard#15 [ 5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 5.410439] Call Trace: [ 5.410764] dump_stack+0x87/0xcd [ 5.411153] print_address_description+0x7a/0x6b0 [ 5.411687] ? vprintk_func+0xb9/0xc0 [ 5.411905] ? printk+0x76/0x96 [ 5.412110] ? cbq_enqueue+0x54b/0xea0 [ 5.412323] kasan_report+0x17d/0x220 [ 5.412591] ? cbq_enqueue+0x54b/0xea0 [ 5.412803] __asan_report_load1_noabort+0x10/0x20 [ 5.413119] cbq_enqueue+0x54b/0xea0 [ 5.413400] ? __kasan_check_write+0x10/0x20 [ 5.413679] __dev_queue_xmit+0x9c0/0x1db0 [ 5.413922] dev_queue_xmit+0xc/0x10 [ 5.414136] ip_finish_output2+0x8bc/0xcd0 [ 5.414436] __ip_finish_output+0x472/0x7a0 [ 5.414692] ip_finish_output+0x5c/0x190 [ 5.414940] ip_output+0x2d8/0x3c0 [ 5.415150] ? ip_mc_finish_output+0x320/0x320 [ 5.415429] __ip_queue_xmit+0x753/0x1760 [ 5.415664] ip_queue_xmit+0x47/0x60 [ 5.415874] __tcp_transmit_skb+0x1ef9/0x34c0 [ 5.416129] tcp_connect+0x1f5e/0x4cb0 [ 5.416347] tcp_v4_connect+0xc8d/0x18c0 [ 5.416577] __inet_stream_connect+0x1ae/0xb40 [ 5.416836] ? local_bh_enable+0x11/0x20 [ 5.417066] ? lock_sock_nested+0x175/0x1d0 [ 5.417309] inet_stream_connect+0x5d/0x90 [ 5.417548] ? __inet_stream_connect+0xb40/0xb40 [ 5.417817] __sys_connect+0x260/0x2b0 [ 5.418037] __x64_sys_connect+0x76/0x80 [ 5.418267] do_syscall_64+0x31/0x50 [ 5.418477] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.418770] RIP: 0033:0x473bb7 [ 5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89 [ 5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7 [ 5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007 [ 5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004 [ 5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002 [ 5.422471] [ 5.422562] Allocated by task 299: [ 5.422782] __kasan_kmalloc+0x12d/0x160 [ 5.423007] kasan_kmalloc+0x5/0x10 [ 5.423208] kmem_cache_alloc_trace+0x201/0x2e0 [ 5.423492] tcf_proto_create+0x65/0x290 [ 5.423721] tc_new_tfilter+0x137e/0x1830 [ 5.423957] rtnetlink_rcv_msg+0x730/0x9f0 [ 5.424197] netlink_rcv_skb+0x166/0x300 [ 5.424428] rtnetlink_rcv+0x11/0x20 [ 5.424639] netlink_unicast+0x673/0x860 [ 5.424870] netlink_sendmsg+0x6af/0x9f0 [ 5.425100] __sys_sendto+0x58d/0x5a0 [ 5.425315] __x64_sys_sendto+0xda/0xf0 [ 5.425539] do_syscall_64+0x31/0x50 [ 5.425764] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.426065] [ 5.426157] The buggy address belongs to the object at ffff88800e312200 [ 5.426157] which belongs to the cache kmalloc-128 of size 128 [ 5.426955] The buggy address is located 42 bytes to the right of [ 5.426955] 128-byte region [ffff88800e312200, ffff88800e312280) [ 5.427688] The buggy address belongs to the page: [ 5.427992] page:000000009875fabc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe312 [ 5.428562] flags: 0x100000000000200(slab) [ 5.428812] raw: 0100000000000200 dead000000000100 dead000000000122 ffff888007843680 [ 5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff ffff88800e312401 [ 5.429875] page dumped because: kasan: bad access detected [ 5.430214] page->mem_cgroup:ffff88800e312401 [ 5.430471] [ 5.430564] Memory state around the buggy address: [ 5.430846] ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.431267] ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.432123] ^ [ 5.432391] ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.432810] ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.433229] ================================================================== [ 5.433648] Disabling lock debugging due to kernel taint Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
nmenon
pushed a commit
to nmenon/nm-beagle-linux
that referenced
this pull request
May 22, 2023
commit c4b850d upstream. When gvt debug fs is destroyed, need to have a sane check if drm minor's debugfs root is still available or not, otherwise in case like device remove through unbinding, drm minor's debugfs directory has already been removed, then intel_gvt_debugfs_clean() would act upon dangling pointer like below oops. i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x1926_rid_0x0a.golden_hw_state failed with error -2 i915 0000:00:02.0: MDEV: Registered Console: switching to colour dummy device 80x25 i915 0000:00:02.0: MDEV: Unregistering BUG: kernel NULL pointer dereference, address: 00000000000000a0 PGD 0 P4D 0 Oops: 0002 [beagleboard#1] PREEMPT SMP PTI CPU: 2 PID: 2486 Comm: gfx-unbind.sh Tainted: G I 6.1.0-rc8+ beagleboard#15 Hardware name: Dell Inc. XPS 13 9350/0JXC1H, BIOS 1.13.0 02/10/2020 RIP: 0010:down_write+0x1f/0x90 Code: 1d ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb e8 62 c0 ff ff bf 01 00 00 00 e8 28 5e 31 ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 33 65 48 8b 04 25 c0 bd 01 00 48 89 43 08 bf 01 RSP: 0018:ffff9eb3036ffcc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000000a0 RCX: ffffff8100000000 RDX: 0000000000000001 RSI: 0000000000000064 RDI: ffffffffa48787a8 RBP: ffff9eb3036ffd30 R08: ffffeb1fc45a0608 R09: ffffeb1fc45a05c0 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: ffff91acc33fa328 R14: ffff91acc033f080 R15: ffff91acced533e0 FS: 00007f6947bba740(0000) GS:ffff91ae36d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 00000001133a2002 CR4: 00000000003706e0 Call Trace: <TASK> simple_recursive_removal+0x9f/0x2a0 ? start_creating.part.0+0x120/0x120 ? _raw_spin_lock+0x13/0x40 debugfs_remove+0x40/0x60 intel_gvt_debugfs_clean+0x15/0x30 [kvmgt] intel_gvt_clean_device+0x49/0xe0 [kvmgt] intel_gvt_driver_remove+0x2f/0xb0 i915_driver_remove+0xa4/0xf0 i915_pci_remove+0x1a/0x30 pci_device_remove+0x33/0xa0 device_release_driver_internal+0x1b2/0x230 unbind_store+0xe0/0x110 kernfs_fop_write_iter+0x11b/0x1f0 vfs_write+0x203/0x3d0 ksys_write+0x63/0xe0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6947cb5190 Code: 40 00 48 8b 15 71 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d 51 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 RSP: 002b:00007ffcbac45a28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f6947cb5190 RDX: 000000000000000d RSI: 0000555e35c866a0 RDI: 0000000000000001 RBP: 0000555e35c866a0 R08: 0000000000000002 R09: 0000555e358cb97c R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001 R13: 000000000000000d R14: 0000000000000000 R15: 0000555e358cb8e0 </TASK> Modules linked in: kvmgt CR2: 00000000000000a0 ---[ end trace 0000000000000000 ]--- Cc: Wang, Zhi <zhi.a.wang@intel.com> Cc: He, Yu <yu.he@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Fixes: bc7b0be ("drm/i915/gvt: Add basic debugfs infrastructure") Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20221219140357.769557-1-zhenyuw@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Mar 4, 2024
…s_del_by_dev() [ Upstream commit 01a564b ] I got the below warning trace: WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0 Call Trace: rtnl_dellink rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg __sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 entry_SYSCALL_64_after_hwframe It can be repoduced via: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode 0 ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 ip netns exec ns1 ip link set bond_slave_1 master bond0 [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0 [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0 [4] ip netns exec ns1 ip link set bond_slave_1 nomaster [5] ip netns exec ns1 ip link del veth2 ip netns del ns1 This is all caused by command [1] turning off the rx-vlan-filter function of bond0. The reason is the same as commit 01f4fd2 ("bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands [2] [3] add the same vid to slave and master respectively, causing command [4] to empty slave->vlan_info. The following command [5] triggers this problem. To fix this problem, we should add VLAN_FILTER feature checks in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect addition or deletion of vlan_vid information. Fixes: 348a144 ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Nov 11, 2024
KASAN reports an out of bounds read: BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36 BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline] BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410 security/keys/permission.c:54 Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362 CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:123 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 __kuid_val include/linux/uidgid.h:36 [inline] uid_eq include/linux/uidgid.h:63 [inline] key_task_permission+0x394/0x410 security/keys/permission.c:54 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793 This issue was also reported by syzbot. It can be reproduced by following these steps(more details [1]): 1. Obtain more than 32 inputs that have similar hashes, which ends with the pattern '0xxxxxxxe6'. 2. Reboot and add the keys obtained in step 1. The reproducer demonstrates how this issue happened: 1. In the search_nested_keyrings function, when it iterates through the slots in a node(below tag ascend_to_node), if the slot pointer is meta and node->back_pointer != NULL(it means a root), it will proceed to descend_to_node. However, there is an exception. If node is the root, and one of the slots points to a shortcut, it will be treated as a keyring. 2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function. However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as ASSOC_ARRAY_PTR_SUBTYPE_MASK. 3. When 32 keys with the similar hashes are added to the tree, the ROOT has keys with hashes that are not similar (e.g. slot 0) and it splits NODE A without using a shortcut. When NODE A is filled with keys that all hashes are xxe6, the keys are similar, NODE A will split with a shortcut. Finally, it forms the tree as shown below, where slot 6 points to a shortcut. NODE A +------>+---+ ROOT | | 0 | xxe6 +---+ | +---+ xxxx | 0 | shortcut : : xxe6 +---+ | +---+ xxe6 : : | | | xxe6 +---+ | +---+ | 6 |---+ : : xxe6 +---+ +---+ xxe6 : : | f | xxe6 +---+ +---+ xxe6 | f | +---+ 4. As mentioned above, If a slot(slot 6) of the root points to a shortcut, it may be mistakenly transferred to a key*, leading to a read out-of-bounds read. To fix this issue, one should jump to descend_to_node if the ptr is a shortcut, regardless of whether the node is root or not. [1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/ [jarkko: tweaked the commit message a bit to have an appropriate closes tag.] Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring") Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Mar 17, 2025
[ Upstream commit 1626e5e ] While performing the rq locking dance in dispatch_to_local_dsq(), we may trigger the following lock imbalance condition, in particular when multiple tasks are rapidly changing CPU affinity (i.e., running a `stress-ng --race-sched 0`): [ 13.413579] ===================================== [ 13.413660] WARNING: bad unlock balance detected! [ 13.413729] 6.13.0-virtme #15 Not tainted [ 13.413792] ------------------------------------- [ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at: [ 13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0 [ 13.414111] but there are no more locks to release! [ 13.414176] [ 13.414176] other info that might help us debug this: [ 13.414258] 1 lock held by kworker/1:1/80: [ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90 [ 13.414612] [ 13.414612] stack backtrace: [ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15 [ 13.415505] Workqueue: 0x0 (events) [ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms [ 13.415570] Call Trace: [ 13.415700] <TASK> [ 13.415744] dump_stack_lvl+0x78/0xe0 [ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.415884] print_unlock_imbalance_bug+0x11b/0x130 [ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.416226] lock_release+0x231/0x2c0 [ 13.416326] _raw_spin_unlock+0x1b/0x40 [ 13.416422] dispatch_to_local_dsq+0x108/0x1a0 [ 13.416554] flush_dispatch_buf+0x199/0x1d0 [ 13.416652] balance_one+0x194/0x370 [ 13.416751] balance_scx+0x61/0x1e0 [ 13.416848] prev_balance+0x43/0xb0 [ 13.416947] __pick_next_task+0x6b/0x1b0 [ 13.417052] __schedule+0x20d/0x1740 This happens because dispatch_to_local_dsq() is racing with dispatch_dequeue() and, when the latter wins, we incorrectly assume that the task has been moved to dst_rq. Fix by properly tracking the currently locked rq. Fixes: 4d3ca89 ("sched_ext: Refactor consume_remote_task()") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Jul 11, 2025
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Tianxiang Peng <txpeng@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Jul 31, 2025
[ Upstream commit eedf3e3 ] ACPICA commit 1c28da2242783579d59767617121035dafba18c3 This was originally done in NetBSD: NetBSD/src@b69d1ac and is the correct alternative to the smattering of `memcpy`s I previously contributed to this repository. This also sidesteps the newly strict checks added in UBSAN: llvm/llvm-project@7926744 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #1.2 0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c #1.1 0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c #1 0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c #2 0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f #3 0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723 #4 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #5 0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089 #6 0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169 #7 0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a #8 0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7 #9 0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979 #10 0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f #11 0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf #12 0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278 #13 0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87 #14 0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d #15 0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e #16 0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad #17 0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e #18 0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7 #19 0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342 #20 0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3 #21 0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616 #22 0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323 #23 0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76 #24 0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831 #25 0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc #26 0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58 #27 0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159 #28 0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414 #29 0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d #30 0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7 #31 0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66 #32 0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9 #33 0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d #34 0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983 #35 0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e #36 0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509 #37 0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958 #38 0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247 #39 0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962 #40 0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30 #41 0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d Link: acpica/acpica@1c28da22 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4664267.LvFx2qVVIh@rjwysocki.net Signed-off-by: Tamir Duberstein <tamird@gmail.com> [ rjw: Pick up the tag from Tamir ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Nov 26, 2025
[ Upstream commit 48918ca ] The test starts a workload and then opens events. If the events fail to open, for example because of perf_event_paranoid, the gopipe of the workload is leaked and the file descriptor leak check fails when the test exits. To avoid this cancel the workload when opening the events fails. Before: ``` $ perf test -vv 7 7: PERF_RECORD_* events & perf_sample fields: --- start --- test child forked, pid 1189568 Using CPUID GenuineIntel-6-B7-1 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 exclude_kernel 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 exclude_kernel 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 Attempt to add: software/cpu-clock/ ..after resolving event: software/config=0/ cpu-clock -> software/cpu-clock/ ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x9 (PERF_COUNT_SW_DUMMY) sample_type IP|TID|TIME|CPU read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 { wakeup_events, wakeup_watermark } 1 ------------------------------------------------------------ sys_perf_event_open: pid 1189569 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 perf_evlist__open: Permission denied ---- end(-2) ---- Leak of file descriptor 6 that opened: 'pipe:[14200347]' ---- unexpected signal (6) ---- iFailed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311 #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0 #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44 #3 0x7f29ce849cc2 in raise raise.c:27 #4 0x7f29ce8324ac in abort abort.c:81 #5 0x565358f662d4 in check_leaks builtin-test.c:226 #6 0x565358f6682e in run_test_child builtin-test.c:344 #7 0x565358ef7121 in start_command run-command.c:128 #8 0x565358f67273 in start_test builtin-test.c:545 #9 0x565358f6771d in __cmd_test builtin-test.c:647 #10 0x565358f682bd in cmd_test builtin-test.c:849 #11 0x565358ee5ded in run_builtin perf.c:349 #12 0x565358ee6085 in handle_internal_command perf.c:401 #13 0x565358ee61de in run_argv perf.c:448 #14 0x565358ee6527 in main perf.c:555 #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74 #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 #17 0x565358e391c1 in _start perf[851c1] 7: PERF_RECORD_* events & perf_sample fields : FAILED! ``` After: ``` $ perf test 7 7: PERF_RECORD_* events & perf_sample fields : Skip (permissions) ``` Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RobertCNelson
pushed a commit
that referenced
this pull request
Nov 26, 2025
commit 1d3ad18 upstream. syzbot reported a BUG_ON in ext4_es_cache_extent() when opening a verity file on a corrupted ext4 filesystem mounted without a journal. The issue is that the filesystem has an inode with both the INLINE_DATA and EXTENTS flags set: EXT4-fs error (device loop0): ext4_cache_extents:545: inode #15: comm syz.0.17: corrupted extent tree: lblk 0 < prev 66 Investigation revealed that the inode has both flags set: DEBUG: inode 15 - flag=1, i_inline_off=164, has_inline=1, extents_flag=1 This is an invalid combination since an inode should have either: - INLINE_DATA: data stored directly in the inode - EXTENTS: data stored in extent-mapped blocks Having both flags causes ext4_has_inline_data() to return true, skipping extent tree validation in __ext4_iget(). The unvalidated out-of-order extents then trigger a BUG_ON in ext4_es_cache_extent() due to integer underflow when calculating hole sizes. Fix this by detecting this invalid flag combination early in ext4_iget() and rejecting the corrupted inode. Cc: stable@kernel.org Reported-and-tested-by: syzbot+038b7bf43423e132b308@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=038b7bf43423e132b308 Suggested-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Message-ID: <20250930112810.315095-1-kartikey406@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current system of setting channels to inactive
past the last serializer can close ongoing serializers
from a different stream. This fix sets serialisers of
a specific stream to inactive at the moment that stream
is closed.
Signed-off-by: Cor Peters corpeters@incas3.nl