build(deps): bump certifi from 2023.7.22 to 2024.7.4 in /drivers/gpu/drm/ci/xfails#9
Open
dependabot[bot] wants to merge 37 commits intomasterfrom
Open
Conversation
Commit 343f4c4 ("kthread: Don't allocate kthread_struct for init and umh") introduces a new function user_mode_thread() for init and umh. init and umh are different from typical kernel threads since the don't need a "kthread" struct and they will finally become user processes by calling kernel_execve(), but on the other hand, they are also different from typical user mode threads (they have no "mm" structs at creation time, which is traditionally used to distinguish a user thread and a kernel thread). So I think it is reasonable to treat init and umh as "special kernel threads". Then let's unify the kernel_thread() and user_mode_thread() to kernel_thread() again, and add a new 'user' parameter for init and umh. This also makes code simpler. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
…rqs() Adjust the return value semanteme of msi_domain_prepare_irqs(), which allows us to modify the input nvec by overriding the msi_domain_ops:: msi_prepare(). This is necessary for the later patch. Before: 0 on success, others on error. After: = 0: Success; > 0: The modified nvec; < 0: Error code. Callers are also updated. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Loongson machines can have as many as 256 logical cpus, but the maximum of msi vectors in one irqchip is also 256 (practically that is less than 256, because pch-pic consumes some of them). Even on a 64-core machine, 256 irqs can be easily exhausted if there are several NICs (NICs usually allocate msi irqs depending on the number of online cpus). So we want to limit the msi allocation. In this patch we add a machanism to limit msi allocation: 1, Modify input "nvec" by overriding the msi_domain_ops::msi_prepare(); 2, The default limit is 256, which is compatible with the old behavior; 3, Add a cmdline parameter "loongson_msi_limit=xxx" to control the limit. Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On multiple bridge platform, a MSIX vector is often affinitive with only one cpu, and the count of MSIX is determined as the count of cpus in the system. Unfortunately, the cpu group related to a brigde is only allowed to handle interrupts from devices behind the bridge, which breaks the normal affinity setting for multiple MSIX vectors, and causing following affinity setting: IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 CPU: 0, 1, 2, 3, 4, 0, 0, 0, 0, 0 To balance the affinity, we improve the setting to assign cpu for IRQ as following: IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 CPU: 0, 1, 2, 3, 4, 0, 1, 2, 3, 4 Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When the best selected CPU is offline, work_on_cpu() will stuck forever. This can be happen if a node is online while all its CPUs are offline (we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore, in this case, we should call local_pci_probe() instead of work_on_cpu(). Cc: <stable@vger.kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
LS7A chipset can be used as a downstream bridge which connected to a high-level host bridge. In this case DEV_LS7A_PCIE_PORT5 is used as the upward port. We should always enable MSI caps of this port, otherwise downstream devices cannot use MSI. Cc: <stable@vger.kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Don't limit MRRS during resume, so that saved value can be restored, otherwise the MRRS will become the minimum value after resume. Cc: <stable@vger.kernel.org> Fixes: 8b3517f ("PCI: loongson: Prevent LS7A MRRS increases") Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value. But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there! Actually, back when the LoongArch port was under review, people were aware of the same problem with sandboxing clone3 [3], so clone was eventually kept. Unfortunately it seemed at that time no one had noticed statx, so besides restoring fstat/newfstatat to LoongArch uapi (and postponing the problem further), it seems inevitable that we would need to tackle seccomp deep argument inspection. However, this is obviously a decision that shouldn't be taken lightly, so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT in unistd.h. This is the simplest solution for now, and so we hope the community will tackle the long-standing problem of seccomp deep argument inspection in the future [4][5]. More infomation please reading this thread [6]. [1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150 [2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355 [3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/ [4] https://lwn.net/Articles/799557/ [5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf [6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Some drivers want to use cpu_logical_map(), early_cpu_to_node() and some other CPU mapping APIs, even if we use "nr_cpus=1" to hard limit the CPU number. This is strongly required for the multi-bridges machines. Currently, we stop parsing the MADT if the nr_cpus limit is reached, but to achieve the above goal we should always enumerate the MADT table and setup logical-physical CPU mapping whether there is a nr_cpus limit. Rework the MADT enumeration: 1. Define a flag "cpu_enumerated" to distinguish the first enumeration (cpu_enumerated=0) and the physical hotplug case (cpu_enumerated=1) for set_processor_mask(). 2. If cpu_enumerated=0, stop parsing only when NR_CPUS limit is reached, so we can setup logical-physical CPU mapping; if cpu_enumerated=1, stop parsing when nr_cpu_ids limit is reached, so we can avoid some runtime bugs. Once logical-physical CPU mapping is setup, we will let cpu_enumerated=1. 3. Use find_first_zero_bit() instead of cpumask_next_zero() to find the next zero bit (free logical CPU id) in the cpu_present_mask, because cpumask_next_zero() will stop at nr_cpu_ids. 4. Only touch cpu_possible_mask if cpu_enumerated=0, this is in order to avoid some potential crashes, because cpu_possible_mask is marked as __ro_after_init. 5. In prefill_possible_map(), clear cpu_present_mask bits greater than nr_cpu_ids, in order to avoid a CPU be "present" but not "possible". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add irq_work support for LoongArch via self IPIs. This make it possible to run works in hardware interrupt context, which is a prerequisite for NOHZ_FULL. Implement: - arch_irq_work_raise() - arch_irq_work_has_interrupt() Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In order for things like get_user_pages() to work on ZONE_DEVICE memory, we need a software PTE bit to identify device-backed PFNs. Hook this up along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add ARCH_HAS_DEBUG_VM_PGTABLE selection in Kconfig, in order to make corresponding vm debug features usable on LoongArch. Also update the corresponding arch-support.txt document. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Currently, only TLB-based ioremap() support writecombine, so add the counterpart for DMW-based ioremap() with help of DMW2. The base address (WRITECOMBINE_BASE) is configured as 0xa000000000000000. DMW3 is unused by kernel now, however firmware may leave garbage in them and interfere kernel's address mapping. So clear it as necessary. BTW, centralize the DMW configuration to macro SETUP_DMWINS. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Most LoongArch 64 machines are using custom "SADR" ACPI extension to perform ACPI S3 sleep. However the standard ACPI way to perform sleep is to write a value to ACPI PM1/SLEEP_CTL register, and this is never supported properly in kernel. Add standard S3 sleep by providing a default DoSuspend function which calls ACPI's acpi_enter_sleep_state() routine when SADR is not provided by the firmware. Also fix suspend assembly code so that ra is set properly before go into sleep routine. (Previously linked address of jirl was set to a0, some firmware do require return address in a0 but it's already set with la.pcrel before). Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Hibernation assumes the memory layout after resume be the same as that before sleep, so it expects the kernel is loaded at the same position. To achieve this goal we automatically disable KASLR if user explicitly requests hibernation via the "resume=" command line. Since "nohibernate" and "noresume" have higher priorities than "resume=", we only disable KASLR if there is no "nohibernate" and "noresume". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
fw_arg1 is in memory space rather than I/O space, so we should use early_memremap_ro() instead of early_ioremap() to map the cmdline. Moreover, we should unmap it after using. Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
-Zdirect-access-external-data is a new Rust compiler option added in Rust 1.78, which we use to optimize the access of external data in the Linux kernel's Rust code. This patch modifies the Rust code in vmlinux to directly access externa data, using PC-REL instead of GOT. However, Rust code whithin modules is constrained by the PC-REL addressing range and is explicitly set to use an indirect method. Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch defines UPROBE_SWBP_INSN as a function call and this breaks arch_uprobe_trampoline() which uses it to initialize a static variable. Add the new "__builtin_constant_p" helper, __emit_break(), and redefine the current users of larch_insn_gen_break() to use it. Fixes: ff474a7 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240614174822.GA1185149@thelio-3990X/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This add CPU HWMon (temperature sensor) platform driver for Loongson-3. Tested-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected, cpu_max_bits_warn() generates a runtime warning similar as below while we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) instead of NR_CPUS to iterate CPUs. [ 3.052463] ------------[ cut here ]------------ [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 [ 3.070072] Modules linked in: efivarfs autofs4 [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c [ 3.195868] ... [ 3.199917] Call Trace: [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88 [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100 [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0 [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4 [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4 [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0 [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100 [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94 [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160 [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If an NTFS file system is mounted to another system with different
PAGE_SIZE from the original system, log->page_size will change in
log_replay(), but log->page_{mask,bits} don't change correspondingly.
This will cause a panic because "u32 bytes = log->page_size - page_off"
will get a negative value in the later read_log_page().
Cc: stable@vger.kernel.org
Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The label end_reply is obviously a typo. It should be "replay" in this context. So rename end_reply to end_replay. Cc: stable@vger.kernel.org Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Debug machanism include: gdb, ftrace, kprobe, uprobe and jump_label Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add driver support for the IOMMU in LS7A. If you need to enable it, please add the loongson_iommu=on kernel parameter. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Consider a configuration like this: 1, efifb (or simpledrm) is built-in; 2, a native display driver (such as radeon) is also built-in. As Javier said, this is not a common configuration (the native display driver is usually built as a module), but it can happen and cause some trouble. In this case, since efifb, radeon and sysfb are all in device_initcall() level, the order in practise is like this: efifb registered at first, but no "efi-framebuffer" device yet. radeon registered later, and /dev/fb0 created. sysfb_init() comes at last, it registers "efi-framebuffer" and then causes an error message "efifb: a framebuffer is already registered". Make sysfb_init() to be subsys_ initcall_sync() can avoid this. And Javier Martinez Canillas is trying to make a more general solution in commit 873eb3b ("fbdev: Disable sysfb device registration when removing conflicting FBs"). However, this patch still makes sense because it can make the screen display as early as possible (We cannot move to subsys_initcall, since sysfb_init() should be executed after PCI enumeration). This is a better version of commit 60aebc9 ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync") since the previous commit leads to blank displays on some systems. The reason is that vgaarb initialization is also a subsys_initcall_sync function so sysfb_disable() is sometimes missed. So here we move sysfb_init() to an fs_initcall function which is ensured after vgaarb initialization. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
After commit 60aebc9 ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank screen until the display manager starts. This regression occurs with such a Kconfig combination: CONFIG_SYSFB=y CONFIG_SYSFB_SIMPLEFB=y CONFIG_DRM_SIMPLEDRM=y CONFIG_DRM_I915=y # Or other native drivers such as radeon, amdgpu If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same device), there is no blank screen. The root cause is the initialization order, and this order depends on the Makefile. FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915 will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM, DRM_SIMPLEDRM will try to takeover I915, but fails to work. So we can move the "tiny" directory before native DRM drivers to solve this problem. Fixes: 60aebc9 ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync") Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t Reported-by: Jaak Ristioja <jaak@ristioja.ee> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Radeon driver can not handle the interrupt is faster than DMA data, so irq handler must update an old ih.rptr value in IH_RB_RPTR register to enable interrupt again when interrupt is faster than DMA data. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Zhijie Zhang <zhangzhijie@loongson.cn>
chenhuacai
pushed a commit
that referenced
this pull request
Oct 29, 2024
There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
d8dcf28 to
776b61b
Compare
a853793 to
f0bd1eb
Compare
6687637 to
1d69018
Compare
chenhuacai
pushed a commit
that referenced
this pull request
Dec 7, 2024
Hou Tao says: ==================== This patch set fixes several issues for LPM trie. These issues were found during adding new test cases or were reported by syzbot. The patch set is structured as follows: Patch #1~#2 are clean-ups for lpm_trie_update_elem(). Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie. Patch #4 fixes the accounting of n_entries when doing in-place update. Patch #5 fixes the exact match condition in trie_get_next_key() and it may skip keys when the passed key is not found in the map. Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie to fix several lock order warnings reported by syzbot. It also enables raw_spinlock_t for LPM trie again. After these changes, the LPM trie will be closer to being usable in any context (though the reentrance check of trie->lock is still missing, but it is on my todo list). Patch #8: move test_lpm_map to map_tests to make it run regularly. Patch #9: add test cases for the issues fixed by patch #3~#5. Please see individual patches for more details. Comments are always welcome. Change Log: v3: * patch #2: remove the unnecessary NULL-init for im_node * patch #6: alloc the leaf node before disabling IRQ to low the possibility of -ENOMEM when leaf_size is large; Free these nodes outside the trie lock (Suggested by Alexei) * collect review and ack tags (Thanks for Toke & Daniel) v2: https://lore.kernel.org/bpf/20241127004641.1118269-1-houtao@huaweicloud.com/ * collect review tags (Thanks for Toke) * drop "Add bpf_mem_cache_is_mergeable() helper" patch * patch #3~#4: add fix tag * patch #4: rename the helper to trie_check_add_elem() and increase n_entries in it. * patch #6: use one bpf mem allocator and update commit message to clarify that using bpf mem allocator is more appropriate. * patch #7: update commit message to add the possible max running time for update operation. * patch #9: update commit message to specify the purpose of these test cases. v1: https://lore.kernel.org/bpf/20241118010808.2243555-1-houtao@huaweicloud.com/ ==================== Link: https://lore.kernel.org/all/20241206110622.1161752-1-houtao@huaweicloud.com/ Signed-off-by: Alexei Starovoitov <ast@kernel.org>
chenhuacai
pushed a commit
that referenced
this pull request
Dec 8, 2024
Kernel will hang on destroy admin_q while we create ctrl failed, such as following calltrace: PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme" #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15 #1 [ff61d23de260fc08] schedule at ffffffff8323c014 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25 RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574 RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004 RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004 R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b This due to we have quiesced admi_q before cancel requests, but forgot to unquiesce before destroy it, as a result we fail to drain the pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and simplify the code. Fixes: 958dc1d ("nvme-rdma: add clean action for failed reconnection") Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com> Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com> Signed-off-by: Yue.zhao <yue.zhao@shopee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
46867e4 to
99cb531
Compare
chenhuacai
pushed a commit
that referenced
this pull request
Dec 15, 2024
Its used from trace__run(), for the 'perf trace' live mode, i.e. its
strace-like, non-perf.data file processing mode, the most common one.
The trace__run() function will set trace->host using machine__new_host()
that is supposed to give a machine instance representing the running
machine, and since we'll use perf_env__arch_strerrno() to get the right
errno -> string table, we need to use machine->env, so initialize it in
machine__new_host().
Before the patch:
(gdb) run trace --errno-summary -a sleep 1
<SNIP>
Summary of events:
gvfs-afc-volume (3187), 2 events, 0.0%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
pselect6 1 0 0.000 0.000 0.000 0.000 0.00%
GUsbEventThread (3519), 2 events, 0.0%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
poll 1 0 0.000 0.000 0.000 0.000 0.00%
<SNIP>
Program received signal SIGSEGV, Segmentation fault.
0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
478 if (env->arch_strerrno == NULL)
(gdb) bt
#0 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
#1 0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673
#2 0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708
#3 0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747
#4 0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456
#5 0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487
#6 0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351
#7 0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404
#8 0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448
#9 0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560
(gdb)
After:
root@number:~# perf trace -a --errno-summary sleep 1
<SNIP>
pw-data-loop (2685), 1410 events, 16.0%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
epoll_wait 188 0 983.428 0.000 5.231 15.595 8.68%
ioctl 94 0 0.811 0.004 0.009 0.016 2.82%
read 188 0 0.322 0.001 0.002 0.006 5.15%
write 141 0 0.280 0.001 0.002 0.018 8.39%
timerfd_settime 94 0 0.138 0.001 0.001 0.007 6.47%
gnome-control-c (179406), 1848 events, 20.9%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
poll 222 0 959.577 0.000 4.322 21.414 11.40%
recvmsg 150 0 0.539 0.001 0.004 0.013 5.12%
write 300 0 0.442 0.001 0.001 0.007 3.29%
read 150 0 0.183 0.001 0.001 0.009 5.53%
getpid 102 0 0.101 0.000 0.001 0.008 7.82%
root@number:~#
Fixes: 54373b5 ("perf env: Introduce perf_env__arch_strerrno()")
Reported-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
8574ea2 to
9fa5382
Compare
chenhuacai
pushed a commit
that referenced
this pull request
Dec 22, 2024
Chase reports that their tester complaints about a locking context mismatch: ============================= [ BUG: Invalid wait context ] 6.13.0-rc1-gf137f14b7ccb-dirty #9 Not tainted ----------------------------- syz.1.25198/182604 is trying to lock: ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: spin_lock_irq include/linux/spinlock.h:376 [inline] ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: io_match_task_safe io_uring/io_uring.c:218 [inline] ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204 other info that might help us debug this: context-{5:5} 1 lock held by syz.1.25198/182604: #0: ffff88802b7d48c0 (&acct->lock){+.+.}-{2:2}, at: io_acct_cancel_pending_work+0x2d/0x6b0 io_uring/io-wq.c:1049 stack backtrace: CPU: 0 UID: 0 PID: 182604 Comm: syz.1.25198 Not tainted 6.13.0-rc1-gf137f14b7ccb-dirty #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x82/0xd0 lib/dump_stack.c:120 print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline] check_wait_context kernel/locking/lockdep.c:4898 [inline] __lock_acquire+0x883/0x3c80 kernel/locking/lockdep.c:5176 lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5849 __raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline] _raw_spin_lock_irq+0x36/0x50 kernel/locking/spinlock.c:170 spin_lock_irq include/linux/spinlock.h:376 [inline] io_match_task_safe io_uring/io_uring.c:218 [inline] io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204 io_acct_cancel_pending_work+0xb8/0x6b0 io_uring/io-wq.c:1052 io_wq_cancel_pending_work io_uring/io-wq.c:1074 [inline] io_wq_cancel_cb+0xb0/0x390 io_uring/io-wq.c:1112 io_uring_try_cancel_requests+0x15e/0xd70 io_uring/io_uring.c:3062 io_uring_cancel_generic+0x6ec/0x8c0 io_uring/io_uring.c:3140 io_uring_files_cancel include/linux/io_uring.h:20 [inline] do_exit+0x494/0x27a0 kernel/exit.c:894 do_group_exit+0xb3/0x250 kernel/exit.c:1087 get_signal+0x1d77/0x1ef0 kernel/signal.c:3017 arch_do_signal_or_restart+0x79/0x5b0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xd8/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f which is because io_uring has ctx->timeout_lock nesting inside the io-wq acct lock, the latter of which is used from inside the scheduler and hence is a raw spinlock, while the former is a "normal" spinlock and can hence be sleeping on PREEMPT_RT. Change ctx->timeout_lock to be a raw spinlock to solve this nesting dependency on PREEMPT_RT=y. Reported-by: chase xd <sl1589472800@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
cd3a92e to
d42c90a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps certifi from 2023.7.22 to 2024.7.4.
Commits
bd815382024.07.04 (#295)06a2cbfBump peter-evans/create-pull-request from 6.0.5 to 6.1.0 (#294)13bba02Bump actions/checkout from 4.1.6 to 4.1.7 (#293)e8abcd0Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 (#292)124f4ad2024.06.02 (#291)c2196ce--- (#290)fefdeecBump actions/checkout from 4.1.4 to 4.1.5 (#289)3c5fb15Bump actions/download-artifact from 4.1.6 to 4.1.7 (#286)4a9569aBump actions/checkout from 4.1.2 to 4.1.4 (#287)1fc8086Bump peter-evans/create-pull-request from 6.0.4 to 6.0.5 (#288)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.