Skip to content

build(deps): bump certifi from 2023.7.22 to 2024.7.4 in /drivers/gpu/drm/ci/xfails#9

Open
dependabot[bot] wants to merge 37 commits intomasterfrom
dependabot/pip/drivers/gpu/drm/ci/xfails/certifi-2024.7.4
Open

build(deps): bump certifi from 2023.7.22 to 2024.7.4 in /drivers/gpu/drm/ci/xfails#9
dependabot[bot] wants to merge 37 commits intomasterfrom
dependabot/pip/drivers/gpu/drm/ci/xfails/certifi-2024.7.4

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jul 6, 2024

Bumps certifi from 2023.7.22 to 2024.7.4.

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

chenhuacai and others added 30 commits July 5, 2024 17:48
Commit 343f4c4 ("kthread: Don't allocate kthread_struct for init
and umh") introduces a new function user_mode_thread() for init and umh.

init and umh are different from typical kernel threads since the don't
need a "kthread" struct and they will finally become user processes by
calling kernel_execve(), but on the other hand, they are also different
from typical user mode threads (they have no "mm" structs at creation
time, which is traditionally used to distinguish a user thread and a
kernel thread).

So I think it is reasonable to treat init and umh as "special kernel
threads". Then let's unify the kernel_thread() and user_mode_thread()
to kernel_thread() again, and add a new 'user' parameter for init and
umh.

This also makes code simpler.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
…rqs()

Adjust the return value semanteme of msi_domain_prepare_irqs(), which
allows us to modify the input nvec by overriding the msi_domain_ops::
msi_prepare(). This is necessary for the later patch.

Before:
0 on success, others on error.

After:
= 0: Success;
> 0: The modified nvec;
< 0: Error code.

Callers are also updated.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Loongson machines can have as many as 256 logical cpus, but the maximum
of msi vectors in one irqchip is also 256 (practically that is less than
256, because pch-pic consumes some of them). Even on a 64-core machine,
256 irqs can be easily exhausted if there are several NICs (NICs usually
allocate msi irqs depending on the number of online cpus). So we want to
limit the msi allocation.

In this patch we add a machanism to limit msi allocation:
1, Modify input "nvec" by overriding the msi_domain_ops::msi_prepare();
2, The default limit is 256, which is compatible with the old behavior;
3, Add a cmdline parameter "loongson_msi_limit=xxx" to control the limit.

Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On multiple bridge platform, a MSIX vector is often affinitive with only
one cpu, and the count of MSIX is determined as the count of cpus in the
system. Unfortunately, the cpu group related to a brigde is only allowed
to handle interrupts from devices behind the bridge, which breaks the
normal affinity setting for multiple MSIX vectors, and causing following
affinity setting:

IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
CPU: 0, 1, 2, 3, 4, 0,  0,  0,  0,  0

To balance the affinity, we improve the setting to assign cpu for IRQ as
following:

IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
CPU: 0, 1, 2, 3, 4, 0,  1,  2,  3,  4

Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When the best selected CPU is offline, work_on_cpu() will stuck forever.
This can be happen if a node is online while all its CPUs are offline
(we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore,
in this case, we should call local_pci_probe() instead of work_on_cpu().

Cc: <stable@vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
LS7A chipset can be used as a downstream bridge which connected to a
high-level host bridge. In this case DEV_LS7A_PCIE_PORT5 is used as the
upward port. We should always enable MSI caps of this port, otherwise
downstream devices cannot use MSI.

Cc: <stable@vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Don't limit MRRS during resume, so that saved value can be restored,
otherwise the MRRS will become the minimum value after resume.

Cc: <stable@vger.kernel.org>
Fixes: 8b3517f ("PCI: loongson: Prevent LS7A MRRS increases")
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.

But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!

Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.

However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].

More infomation please reading this thread [6].

[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Some drivers want to use cpu_logical_map(), early_cpu_to_node() and some
other CPU mapping APIs, even if we use "nr_cpus=1" to hard limit the CPU
number. This is strongly required for the multi-bridges machines.

Currently, we stop parsing the MADT if the nr_cpus limit is reached, but
to achieve the above goal we should always enumerate the MADT table and
setup logical-physical CPU mapping whether there is a nr_cpus limit.

Rework the MADT enumeration:

1. Define a flag "cpu_enumerated" to distinguish the first enumeration
   (cpu_enumerated=0) and the physical hotplug case (cpu_enumerated=1)
   for set_processor_mask().

2. If cpu_enumerated=0, stop parsing only when NR_CPUS limit is reached,
   so we can setup logical-physical CPU mapping; if cpu_enumerated=1,
   stop parsing when nr_cpu_ids limit is reached, so we can avoid some
   runtime bugs. Once logical-physical CPU mapping is setup, we will let
   cpu_enumerated=1.

3. Use find_first_zero_bit() instead of cpumask_next_zero() to find the
   next zero bit (free logical CPU id) in the cpu_present_mask, because
   cpumask_next_zero() will stop at nr_cpu_ids.

4. Only touch cpu_possible_mask if cpu_enumerated=0, this is in order to
   avoid some potential crashes, because cpu_possible_mask is marked as
   __ro_after_init.

5. In prefill_possible_map(), clear cpu_present_mask bits greater than
   nr_cpu_ids, in order to avoid a CPU be "present" but not "possible".

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add irq_work support for LoongArch via self IPIs. This make it possible
to run works in hardware interrupt context, which is a prerequisite for
NOHZ_FULL.

Implement:
 - arch_irq_work_raise()
 - arch_irq_work_has_interrupt()

Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs.  Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add ARCH_HAS_DEBUG_VM_PGTABLE selection in Kconfig, in order to make
corresponding vm debug features usable on LoongArch. Also update the
corresponding arch-support.txt document.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Currently, only TLB-based ioremap() support writecombine, so add the
counterpart for DMW-based ioremap() with help of DMW2. The base address
(WRITECOMBINE_BASE) is configured as 0xa000000000000000.

DMW3 is unused by kernel now, however firmware may leave garbage in them
and interfere kernel's address mapping. So clear it as necessary.

BTW, centralize the DMW configuration to macro SETUP_DMWINS.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Most LoongArch 64 machines are using custom "SADR" ACPI extension to
perform ACPI S3 sleep. However the standard ACPI way to perform sleep
is to write a value to ACPI PM1/SLEEP_CTL register, and this is never
supported properly in kernel.

Add standard S3 sleep by providing a default DoSuspend function which
calls ACPI's acpi_enter_sleep_state() routine when SADR is not provided
by the firmware.

Also fix suspend assembly code so that ra is set properly before go
into sleep routine. (Previously linked address of jirl was set to a0,
some firmware do require return address in a0 but it's already set with
la.pcrel before).

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Hibernation assumes the memory layout after resume be the same as that
before sleep, so it expects the kernel is loaded at the same position.
To achieve this goal we automatically disable KASLR if user explicitly
requests hibernation via the "resume=" command line. Since "nohibernate"
and "noresume" have higher priorities than "resume=", we only disable
KASLR if there is no "nohibernate" and "noresume".

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
fw_arg1 is in memory space rather than I/O space, so we should use
early_memremap_ro() instead of early_ioremap() to map the cmdline.
Moreover, we should unmap it after using.

Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
-Zdirect-access-external-data is a new Rust compiler option added in
Rust 1.78, which we use to optimize the access of external data in the
Linux kernel's Rust code. This patch modifies the Rust code in vmlinux
to directly access externa data, using PC-REL instead of GOT. However,
Rust code whithin modules is constrained by the PC-REL addressing range
and is explicitly set to use an indirect method.

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch defines UPROBE_SWBP_INSN as a function call and this breaks
arch_uprobe_trampoline() which uses it to initialize a static variable.

Add the new "__builtin_constant_p" helper, __emit_break(), and redefine
the current users of larch_insn_gen_break() to use it.

Fixes: ff474a7 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20240614174822.GA1185149@thelio-3990X/
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This add CPU HWMon (temperature sensor) platform driver for Loongson-3.

Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected,
cpu_max_bits_warn() generates a runtime warning similar as below while
we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.

[    3.052463] ------------[ cut here ]------------
[    3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0
[    3.070072] Modules linked in: efivarfs autofs4
[    3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052
[    3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000
[    3.109127]         9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430
[    3.118774]         90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff
[    3.128412]         0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890
[    3.138056]         0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa
[    3.147711]         ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000
[    3.157364]         900000000101c998 0000000000000004 9000000000ef7430 0000000000000000
[    3.167012]         0000000000000009 000000000000006c 0000000000000000 0000000000000000
[    3.176641]         9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286
[    3.186260]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
[    3.195868]         ...
[    3.199917] Call Trace:
[    3.203941] [<90000000002086d8>] show_stack+0x38/0x14c
[    3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88
[    3.217625] [<900000000023d268>] __warn+0xd0/0x100
[    3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc
[    3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0
[    3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4
[    3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4
[    3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0
[    3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100
[    3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94
[    3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160
[    3.281824] ---[ end trace 8b484262b4b8c24c ]---

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If an NTFS file system is mounted to another system with different
PAGE_SIZE from the original system, log->page_size will change in
log_replay(), but log->page_{mask,bits} don't change correspondingly.
This will cause a panic because "u32 bytes = log->page_size - page_off"
will get a negative value in the later read_log_page().

Cc: stable@vger.kernel.org
Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The label end_reply is obviously a typo. It should be "replay" in this
context. So rename end_reply to end_replay.

Cc: stable@vger.kernel.org
Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Debug machanism include: gdb, ftrace, kprobe, uprobe and jump_label

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add driver support for the IOMMU in LS7A. If you need to enable it,
please add the loongson_iommu=on kernel parameter.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Consider a configuration like this:
1, efifb (or simpledrm) is built-in;
2, a native display driver (such as radeon) is also built-in.

As Javier said, this is not a common configuration (the native display
driver is usually built as a module), but it can happen and cause some
trouble.

In this case, since efifb, radeon and sysfb are all in device_initcall()
level, the order in practise is like this:

efifb registered at first, but no "efi-framebuffer" device yet. radeon
registered later, and /dev/fb0 created. sysfb_init() comes at last, it
registers "efi-framebuffer" and then causes an error message "efifb: a
framebuffer is already registered". Make sysfb_init() to be subsys_
initcall_sync() can avoid this. And Javier Martinez Canillas is trying
to make a more general solution in commit 873eb3b ("fbdev: Disable
sysfb device registration when removing conflicting FBs").

However, this patch still makes sense because it can make the screen
display as early as possible (We cannot move to subsys_initcall, since
sysfb_init() should be executed after PCI enumeration).

This is a better version of commit 60aebc9 ("drivers/firmware:
Move sysfb_init() from device_initcall to subsys_initcall_sync") since
the previous commit leads to blank displays on some systems. The reason
is that vgaarb initialization is also a subsys_initcall_sync function so
sysfb_disable() is sometimes missed. So here we move sysfb_init() to an
fs_initcall function which is ensured after vgaarb initialization.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
After commit 60aebc9 ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
screen until the display manager starts.

This regression occurs with such a Kconfig combination:
CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
CONFIG_DRM_SIMPLEDRM=y
CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu

If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
device), there is no blank screen. The root cause is the initialization
order, and this order depends on the Makefile.

FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
DRM_SIMPLEDRM will try to takeover I915, but fails to work.

So we can move the "tiny" directory before native DRM drivers to solve
this problem.

Fixes: 60aebc9 ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
Reported-by: Jaak Ristioja <jaak@ristioja.ee>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Radeon driver can not handle the interrupt is faster than DMA data, so
irq handler must update an old ih.rptr value in IH_RB_RPTR register to
enable interrupt again when interrupt is faster than DMA data.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Zhijie Zhang <zhangzhijie@loongson.cn>
chenhuacai pushed a commit that referenced this pull request Oct 29, 2024
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.

Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:

kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20

This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).

First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.

Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.

Fixes: 2d4a532 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
@chenhuacai chenhuacai force-pushed the master branch 3 times, most recently from d8dcf28 to 776b61b Compare November 3, 2024 01:50
@chenhuacai chenhuacai force-pushed the master branch 4 times, most recently from a853793 to f0bd1eb Compare November 24, 2024 06:26
@chenhuacai chenhuacai force-pushed the master branch 8 times, most recently from 6687637 to 1d69018 Compare December 7, 2024 09:40
chenhuacai pushed a commit that referenced this pull request Dec 7, 2024
Hou Tao says:

====================
This patch set fixes several issues for LPM trie. These issues were
found during adding new test cases or were reported by syzbot.

The patch set is structured as follows:

Patch #1~#2 are clean-ups for lpm_trie_update_elem().
Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
Patch #4 fixes the accounting of n_entries when doing in-place update.
Patch #5 fixes the exact match condition in trie_get_next_key() and it
may skip keys when the passed key is not found in the map.
Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie
to fix several lock order warnings reported by syzbot. It also enables
raw_spinlock_t for LPM trie again. After these changes, the LPM trie will
be closer to being usable in any context (though the reentrance check of
trie->lock is still missing, but it is on my todo list).
Patch #8: move test_lpm_map to map_tests to make it run regularly.
Patch #9: add test cases for the issues fixed by patch #3~#5.

Please see individual patches for more details. Comments are always
welcome.

Change Log:
v3:
  * patch #2: remove the unnecessary NULL-init for im_node
  * patch #6: alloc the leaf node before disabling IRQ to low
    the possibility of -ENOMEM when leaf_size is large; Free
    these nodes outside the trie lock (Suggested by Alexei)
  * collect review and ack tags (Thanks for Toke & Daniel)

v2: https://lore.kernel.org/bpf/20241127004641.1118269-1-houtao@huaweicloud.com/
  * collect review tags (Thanks for Toke)
  * drop "Add bpf_mem_cache_is_mergeable() helper" patch
  * patch #3~#4: add fix tag
  * patch #4: rename the helper to trie_check_add_elem() and increase
    n_entries in it.
  * patch #6: use one bpf mem allocator and update commit message to
    clarify that using bpf mem allocator is more appropriate.
  * patch #7: update commit message to add the possible max running time
    for update operation.
  * patch #9: update commit message to specify the purpose of these test
    cases.

v1: https://lore.kernel.org/bpf/20241118010808.2243555-1-houtao@huaweicloud.com/
====================

Link: https://lore.kernel.org/all/20241206110622.1161752-1-houtao@huaweicloud.com/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
chenhuacai pushed a commit that referenced this pull request Dec 8, 2024
Kernel will hang on destroy admin_q while we create ctrl failed, such
as following calltrace:

PID: 23644    TASK: ff2d52b40f439fc0  CPU: 2    COMMAND: "nvme"
 #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15
 #1 [ff61d23de260fc08] schedule at ffffffff8323c014
 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1
 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a
 #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006
 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce
 #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced
 #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b
 #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362
 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25
    RIP: 00007fda7891d574  RSP: 00007ffe2ef06958  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000055e8122a4d90  RCX: 00007fda7891d574
    RDX: 000000000000012b  RSI: 000055e8122a4d90  RDI: 0000000000000004
    RBP: 00007ffe2ef079c0   R8: 000000000000012b   R9: 000055e8122a4d90
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000004
    R13: 000055e8122923c0  R14: 000000000000012b  R15: 00007fda78a54500
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

This due to we have quiesced admi_q before cancel requests, but forgot
to unquiesce before destroy it, as a result we fail to drain the
pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here
try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and
simplify the code.

Fixes: 958dc1d ("nvme-rdma: add clean action for failed reconnection")
Reported-by: Yingfu.zhou <yingfu.zhou@shopee.com>
Signed-off-by: Chunguang.xu <chunguang.xu@shopee.com>
Signed-off-by: Yue.zhao <yue.zhao@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
@chenhuacai chenhuacai force-pushed the master branch 2 times, most recently from 46867e4 to 99cb531 Compare December 15, 2024 14:17
chenhuacai pushed a commit that referenced this pull request Dec 15, 2024
Its used from trace__run(), for the 'perf trace' live mode, i.e. its
strace-like, non-perf.data file processing mode, the most common one.

The trace__run() function will set trace->host using machine__new_host()
that is supposed to give a machine instance representing the running
machine, and since we'll use perf_env__arch_strerrno() to get the right
errno -> string table, we need to use machine->env, so initialize it in
machine__new_host().

Before the patch:

  (gdb) run trace --errno-summary -a sleep 1
  <SNIP>
   Summary of events:

   gvfs-afc-volume (3187), 2 events, 0.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     pselect6               1      0     0.000     0.000     0.000     0.000      0.00%

   GUsbEventThread (3519), 2 events, 0.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     poll                   1      0     0.000     0.000     0.000     0.000      0.00%
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
  478		if (env->arch_strerrno == NULL)
  (gdb) bt
  #0  0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
  #1  0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673
  #2  0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708
  #3  0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747
  #4  0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456
  #5  0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487
  #6  0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351
  #7  0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404
  #8  0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448
  #9  0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560
  (gdb)

After:

  root@number:~# perf trace -a --errno-summary sleep 1
  <SNIP>
     pw-data-loop (2685), 1410 events, 16.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     epoll_wait           188      0   983.428     0.000     5.231    15.595      8.68%
     ioctl                 94      0     0.811     0.004     0.009     0.016      2.82%
     read                 188      0     0.322     0.001     0.002     0.006      5.15%
     write                141      0     0.280     0.001     0.002     0.018      8.39%
     timerfd_settime       94      0     0.138     0.001     0.001     0.007      6.47%

   gnome-control-c (179406), 1848 events, 20.9%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     poll                 222      0   959.577     0.000     4.322    21.414     11.40%
     recvmsg              150      0     0.539     0.001     0.004     0.013      5.12%
     write                300      0     0.442     0.001     0.001     0.007      3.29%
     read                 150      0     0.183     0.001     0.001     0.009      5.53%
     getpid               102      0     0.101     0.000     0.001     0.008      7.82%

  root@number:~#

Fixes: 54373b5 ("perf env: Introduce perf_env__arch_strerrno()")
Reported-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
@chenhuacai chenhuacai force-pushed the master branch 2 times, most recently from 8574ea2 to 9fa5382 Compare December 20, 2024 14:53
chenhuacai pushed a commit that referenced this pull request Dec 22, 2024
Chase reports that their tester complaints about a locking context
mismatch:

=============================
[ BUG: Invalid wait context ]
6.13.0-rc1-gf137f14b7ccb-dirty #9 Not tainted
-----------------------------
syz.1.25198/182604 is trying to lock:
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: spin_lock_irq
include/linux/spinlock.h:376 [inline]
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at:
io_match_task_safe io_uring/io_uring.c:218 [inline]
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at:
io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204
other info that might help us debug this:
context-{5:5}
1 lock held by syz.1.25198/182604:
 #0: ffff88802b7d48c0 (&acct->lock){+.+.}-{2:2}, at:
io_acct_cancel_pending_work+0x2d/0x6b0 io_uring/io-wq.c:1049
stack backtrace:
CPU: 0 UID: 0 PID: 182604 Comm: syz.1.25198 Not tainted
6.13.0-rc1-gf137f14b7ccb-dirty #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x82/0xd0 lib/dump_stack.c:120
 print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline]
 check_wait_context kernel/locking/lockdep.c:4898 [inline]
 __lock_acquire+0x883/0x3c80 kernel/locking/lockdep.c:5176
 lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5849
 __raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline]
 _raw_spin_lock_irq+0x36/0x50 kernel/locking/spinlock.c:170
 spin_lock_irq include/linux/spinlock.h:376 [inline]
 io_match_task_safe io_uring/io_uring.c:218 [inline]
 io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204
 io_acct_cancel_pending_work+0xb8/0x6b0 io_uring/io-wq.c:1052
 io_wq_cancel_pending_work io_uring/io-wq.c:1074 [inline]
 io_wq_cancel_cb+0xb0/0x390 io_uring/io-wq.c:1112
 io_uring_try_cancel_requests+0x15e/0xd70 io_uring/io_uring.c:3062
 io_uring_cancel_generic+0x6ec/0x8c0 io_uring/io_uring.c:3140
 io_uring_files_cancel include/linux/io_uring.h:20 [inline]
 do_exit+0x494/0x27a0 kernel/exit.c:894
 do_group_exit+0xb3/0x250 kernel/exit.c:1087
 get_signal+0x1d77/0x1ef0 kernel/signal.c:3017
 arch_do_signal_or_restart+0x79/0x5b0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xd8/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

which is because io_uring has ctx->timeout_lock nesting inside the
io-wq acct lock, the latter of which is used from inside the scheduler
and hence is a raw spinlock, while the former is a "normal" spinlock
and can hence be sleeping on PREEMPT_RT.

Change ctx->timeout_lock to be a raw spinlock to solve this nesting
dependency on PREEMPT_RT=y.

Reported-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
@chenhuacai chenhuacai force-pushed the master branch 3 times, most recently from cd3a92e to d42c90a Compare December 29, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants