Backport PLR (Power Limit Reasons) TPMI driver#7
Closed
zhang-rui wants to merge 11 commits into
Closed
Conversation
commit 9cba82b upstream. Patch series "Add helper macro DEFINE_SHOW_STORE_ATTRIBUTE() at seq_file.c", v6. We already own DEFINE_SHOW_ATTRIBUTE() helper macro for defining attribute for read-only file, but we found many of drivers also want a helper macro for read-write file too. So we add this helper macro to reduce duplicated code. This patch (of 3): We already own DEFINE_SHOW_ATTRIBUTE() helper macro for defining attribute for read-only file, but many of drivers want a helper macro for read-write file too. So we add DEFINE_SHOW_STORE_ATTRIBUTE() helper to reduce duplicated code. Intel-SIG: commit 9cba82b seq_file: add helper macro to define attribute for rw file Backport PLR (Power Limit Reasons) TPMI driver Link: https://lkml.kernel.org/r/20230905024835.43219-1-yangxingui@huawei.com Link: https://lkml.kernel.org/r/20230905024835.43219-2-yangxingui@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Co-developed-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Animesh Manna <animesh.manna@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Himanshu Madhani <himanshu.madhani@cavium.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Uma Shankar <uma.shankar@intel.com> Cc: Xiang Chen <chenxiang66@hisilicon.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit 3cd39bc upstream. Touching files so used for the kernel, forces 'make' to recompile most of the kernel. Having those definitions in more granular files helps avoid recompiling so much of the kernel. Intel-SIG: commit 3cd39bc kernel.h: Move ARRAY_SIZE() to a separate header Backport PLR (Power Limit Reasons) TPMI driver Signed-off-by: Alejandro Colomar <alx@kernel.org> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20230817143352.132583-2-lucas.segarra.fernandez@intel.com [andy: reduced to cover only string.h for now] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit c3ed16a upstream. The commit 3cd39bc ("kernel.h: Move ARRAY_SIZE() to a separate header") introduced a new header for the ARRAY_SIZE macro which was previously exposed via linux/kernel.h. Intel-SIG: commit c3ed16a batman-adv: Switch to linux/array_size.h Backport PLR (Power Limit Reasons) TPMI driver Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit b44d79d upstream. Add TPMI ID 0x0C (Perf Limit Reasons) to the list of supported TPMI IDs. Intel-SIG: commit b44d79d platform/x86/intel/tpmi: Add support for performance limit reasons Backport PLR (Power Limit Reasons) TPMI driver Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20240527133400.483634-2-tero.kristo@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit d36842b upstream. Add new API to get the debugfs root directory for TPMI. This allows any TPMI devices to add their own debugfs items under the same directory structure. Intel-SIG: commit d36842b platform/x86/intel/tpmi: Add API to get debugfs root Backport PLR (Power Limit Reasons) TPMI driver Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20240527133400.483634-3-tero.kristo@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit 17ca278 upstream. Each TPMI power domain includes a group of CPUs. Several power management settings in this case applicable to a group of CPUs. There can be several power domains in a CPU package. So, provide interfaces for: - Get power domain id for a Linux CPU - Get mask of Linux CPUs in a power domain Hardware Punit uses different CPU numbering, which is not based on APIC (Advanced Programmable Interrupt Controller) CPU numbering. The Linux CPU numbering is based on APIC CPU numbering. Some PM features like Intel Speed Select, the CPU core mask provided by the hardware is based on the Punit CPU numbering. To use the core mask, this mask needs to be converted to a Linux CPUs mask. So, provide interfaces for: - Convert to a Linux CPU number from a Punit CPU number - Convert to a Punit CPU number from a Linux CPU number On each CPU online, MSR 0x54 is used to read the mapping and stores in a per cpu array. Create a hash for faster searching of a Linux CPU number from a Punit CPU number. Intel-SIG: commit 17ca278 platform/x86/intel: TPMI domain id and CPU mapping Backport PLR (Power Limit Reasons) TPMI driver Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [tero.kristo: minor updates] Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20240528073457.497816-1-tero.kristo@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit 811f67c upstream. Add new auxiliary driver that exposes the SoC performance limit reasons via debugfs interface. Intel-SIG: commit 811f67c platform/x86/intel/tpmi: Add new auxiliary driver for performance limits Backport PLR (Power Limit Reasons) TPMI driver Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20240527133400.483634-5-tero.kristo@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit 9e9397a upstream. Add support for reading fine grained power limit reasons via the PLR mailbox. Intel-SIG: commit 9e9397a platform/x86/intel/tpmi/plr: Add support for the plr mailbox Backport PLR (Power Limit Reasons) TPMI driver Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20240527133400.483634-6-tero.kristo@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit 182c694 upstream. Smatch complains that 'str' can be used without being initialized: drivers/platform/x86/intel/intel_plr_tpmi.c:178 plr_print_bits() error: uninitialized symbol 'str'. In this loop, we iterate over all the set bits and print the name of the bit. The intention is that if there is a bit which is between 0-31 we look for the name in the first array plr_coarse_reasons[] which has 10 elements. If the bit is in the 32-63 range we look for it in the plr_fine_reasons[] array which has 30 elements. If the bit is in the invalid ranges, 10-31 or 62-63, then we should print "UNKNOWN(%d)". The problem is that 'str' needs to be initialized at the start of each iteration, otherwise if we can't find the string then instead of printing "UNKNOWN(%d)", we will re-print whatever the previous bit was. Intel-SIG: commit 182c694 platform/x86/intel/tpmi/plr: Fix output in plr_print_bits() Backport PLR (Power Limit Reasons) TPMI driver Fixes: 9e9397a ("platform/x86/intel/tpmi/plr: Add support for the plr mailbox") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/b0084e70-4144-445a-9b89-fb19f6b8336a@stanley.mountain Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit bee9a0838fd223823e5a6d85c055ab1691dc738e upstream. Add Clearwater Forest support (INTEL_ATOM_DARKMONT_X) to tpmi_cpu_ids to support domaid id mappings. Intel-SIG: commit bee9a0838fd2 platform/x86/intel: power-domains: Add Clearwater Forest support Backport PLR (Power Limit Reasons) TPMI driver Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20250103155255.1488139-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
…for package ID commit aa28991fd5dc4c01a40caab2bd9af8c5e06f9899 upstream. Currently, tpmi_get_logical_id() calls topology_physical_package_id() to set the pkg_id of the info structure. Since some VM hosts assign non contiguous package IDs, topology_physical_package_id() can return a larger value than topology_max_packages(). This will result in an invalid reference into tpmi_power_domain_mask[] as that is allocatead based on topology_max_packages() as the maximum package ID. Intel-SIG: commit aa28991fd5dc platform/x86/intel: power-domains: Use topology_logical_package_id() for package ID Backport PLR (Power Limit Reasons) TPMI driver Fixes: 17ca278 ("platform/x86/intel: TPMI domain id and CPU mapping") Signed-off-by: David Arcari <darcari@redhat.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20250829113859.1772827-1-darcari@redhat.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [ Zhang Rui: amend commit log ] Signed-off-by: Zhang Rui <rui.zhang@intel.com>
x56Jason
pushed a commit
that referenced
this pull request
Dec 14, 2025
commit 05703271c3cdcc0f2a8cf6ebdc45892b8ca83520 upstream. Before disabling SR-IOV via config space accesses to the parent PF, sriov_disable() first removes the PCI devices representing the VFs. Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()") such removal operations are serialized against concurrent remove and rescan using the pci_rescan_remove_lock. No such locking was ever added in sriov_disable() however. In particular when commit 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device removal into sriov_del_vfs() there was still no locking around the pci_iov_remove_virtfn() calls. On s390 the lack of serialization in sriov_disable() may cause double remove and list corruption with the below (amended) trace being observed: PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 0000000000000001 0000000000000000 0000000000000000 0000000180692828 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 #0 [3800313fb20] device_del at c9158ad5c #1 [3800313fb88] pci_remove_bus_device at c915105ba #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 #5 [3800313fca0] __zpci_event_availability at c90fb3dca #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 #7 [3800313fd60] crw_collect_info at c91905822 #8 [3800313fe10] kthread at c90feb390 #9 [3800313fe68] __ret_from_fork at c90f6aa64 openvelinux#10 [3800313fe98] ret_from_fork at c9194f3f2. This is because in addition to sriov_disable() removing the VFs, the platform also generates hot-unplug events for the VFs. This being the reverse operation to the hotplug events generated by sriov_enable() and handled via pdev->no_vf_scan. And while the event processing takes pci_rescan_remove_lock and checks whether the struct pci_dev still exists, the lack of synchronization makes this checking racy. Other races may also be possible of course though given that this lack of locking persisted so long observable races seem very rare. Even on s390 the list corruption was only observed with certain devices since the platform events are only triggered by config accesses after the removal, so as long as the removal finished synchronously they would not race. Either way the locking is missing so fix this by adding it to the sriov_del_vfs() helper. Just like PCI rescan-remove, locking is also missing in sriov_add_vfs() including for the error case where pci_stop_and_remove_bus_device() is called without the PCI rescan-remove lock being held. Even in the non-error case, adding new PCI devices and buses should be serialized via the PCI rescan-remove lock. Add the necessary locking. Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
x56Jason
pushed a commit
that referenced
this pull request
Mar 3, 2026
commit 05703271c3cdcc0f2a8cf6ebdc45892b8ca83520 upstream. Before disabling SR-IOV via config space accesses to the parent PF, sriov_disable() first removes the PCI devices representing the VFs. Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()") such removal operations are serialized against concurrent remove and rescan using the pci_rescan_remove_lock. No such locking was ever added in sriov_disable() however. In particular when commit 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device removal into sriov_del_vfs() there was still no locking around the pci_iov_remove_virtfn() calls. On s390 the lack of serialization in sriov_disable() may cause double remove and list corruption with the below (amended) trace being observed: PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 0000000000000001 0000000000000000 0000000000000000 0000000180692828 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 #0 [3800313fb20] device_del at c9158ad5c #1 [3800313fb88] pci_remove_bus_device at c915105ba #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 #5 [3800313fca0] __zpci_event_availability at c90fb3dca #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 #7 [3800313fd60] crw_collect_info at c91905822 #8 [3800313fe10] kthread at c90feb390 #9 [3800313fe68] __ret_from_fork at c90f6aa64 openvelinux#10 [3800313fe98] ret_from_fork at c9194f3f2. This is because in addition to sriov_disable() removing the VFs, the platform also generates hot-unplug events for the VFs. This being the reverse operation to the hotplug events generated by sriov_enable() and handled via pdev->no_vf_scan. And while the event processing takes pci_rescan_remove_lock and checks whether the struct pci_dev still exists, the lack of synchronization makes this checking racy. Other races may also be possible of course though given that this lack of locking persisted so long observable races seem very rare. Even on s390 the list corruption was only observed with certain devices since the platform events are only triggered by config accesses after the removal, so as long as the removal finished synchronously they would not race. Either way the locking is missing so fix this by adding it to the sriov_del_vfs() helper. Just like PCI rescan-remove, locking is also missing in sriov_add_vfs() including for the error case where pci_stop_and_remove_bus_device() is called without the PCI rescan-remove lock being held. Even in the non-error case, adding new PCI devices and buses should be serialized via the PCI rescan-remove lock. Add the necessary locking. Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
x56Jason
pushed a commit
that referenced
this pull request
Mar 3, 2026
…le_direct_reclaim() commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream. The task sometimes continues looping in throttle_direct_reclaim() because allow_direct_reclaim(pgdat) keeps returning false. #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c #2 [ffff80002cb6f990] schedule at ffff800008abc50c #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4 At this point, the pgdat contains the following two zones: NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32" SIZE: 20480 MIN/LOW/HIGH: 11/28/45 VM_STAT: NR_FREE_PAGES: 359 NR_ZONE_INACTIVE_ANON: 18813 NR_ZONE_ACTIVE_ANON: 0 NR_ZONE_INACTIVE_FILE: 50 NR_ZONE_ACTIVE_FILE: 0 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal" SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264 VM_STAT: NR_FREE_PAGES: 146 NR_ZONE_INACTIVE_ANON: 94668 NR_ZONE_ACTIVE_ANON: 3 NR_ZONE_INACTIVE_FILE: 735 NR_ZONE_ACTIVE_FILE: 78 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of inactive/active file-backed pages calculated in zone_reclaimable_pages() based on the result of zone_page_state_snapshot() is zero. Additionally, since this system lacks swap, the calculation of inactive/ active anonymous pages is skipped. crash> p nr_swap_pages nr_swap_pages = $1937 = { counter = 0 } As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having free pages significantly exceeding the high watermark. The problem is that the pgdat->kswapd_failures hasn't been incremented. crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures $1935 = 0x0 This is because the node deemed balanced. The node balancing logic in balance_pgdat() evaluates all zones collectively. If one or more zones (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the entire node is deemed balanced. This causes balance_pgdat() to exit early before incrementing the kswapd_failures, as it considers the overall memory state acceptable, even though some zones (like ZONE_NORMAL) remain under significant pressure. The patch ensures that zone_reclaimable_pages() includes free pages (NR_FREE_PAGES) in its calculation when no other reclaimable pages are available (e.g., file-backed or anonymous pages). This change prevents zones like ZONE_DMA32, which have sufficient free pages, from being mistakenly deemed unreclaimable. By doing so, the patch ensures proper node balancing, avoids masking pressure on other zones like ZONE_NORMAL, and prevents infinite loops in throttle_direct_reclaim() caused by allow_direct_reclaim(pgdat) repeatedly returning false. The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused by a node being incorrectly deemed balanced despite pressure in certain zones, such as ZONE_NORMAL. This issue arises from zone_reclaimable_pages() returning 0 for zones without reclaimable file- backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient free pages to be skipped. The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored during reclaim, masking pressure in other zones. Consequently, pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback mechanisms in allow_direct_reclaim() from being triggered, leading to an infinite loop in throttle_direct_reclaim(). This patch modifies zone_reclaimable_pages() to account for free pages (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones with sufficient free pages are not skipped, enabling proper balancing and reclaim behavior. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com Fixes: 5a1c84b ("mm: remove reclaim and compaction retry approximations") Signed-off-by: Seiji Nishikawa <snishika@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ashish Bijlani <ashish.bijlani@bytedance.com>
x56Jason
pushed a commit
that referenced
this pull request
Mar 3, 2026
[ Upstream commit f1e197a ] trace_drop_common() is called with preemption disabled, and it acquires a spin_lock. This is problematic for RT kernels because spin_locks are sleeping locks in this configuration, which causes the following splat: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47 preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80 hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170 softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0 Preemption disabled at: [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: <TASK> dump_stack_lvl+0x8c/0xd0 dump_stack+0x14/0x20 __might_resched+0x21e/0x2f0 rt_spin_lock+0x5e/0x130 ? trace_drop_common.constprop.0+0xb5/0x290 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_drop_common.constprop.0+0xb5/0x290 ? preempt_count_sub+0x1c/0xd0 ? _raw_spin_unlock_irqrestore+0x4a/0x80 ? __pfx_trace_drop_common.constprop.0+0x10/0x10 ? rt_mutex_slowunlock+0x26a/0x2e0 ? skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_rt_mutex_slowunlock+0x10/0x10 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_kfree_skb_hit+0x15/0x20 trace_kfree_skb+0xe9/0x150 kfree_skb_reason+0x7b/0x110 skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10 ? mark_lock.part.0+0x8a/0x520 ... trace_drop_common() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock. Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic context. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ashish Bijlani <ashish.bijlani@bytedance.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport PLR (Power Limit Reasons) TPMI driver
Test:
Tested on GNR/CWF, the Perf Limit Reasons driver bugfs is shown and providing valid information.
1: seq_file: add helper macro to define attribute for rw file
2: kernel.h: Move ARRAY_SIZE() to a separate header
3: batman-adv: Switch to linux/array_size.h
4: platform/x86/intel/tpmi: Add support for performance limit reasons
5: platform/x86/intel/tpmi: Add API to get debugfs root
6: platform/x86/intel: TPMI domain id and CPU mapping
7: platform/x86/intel/tpmi: Add new auxiliary driver for performance limits
8: platform/x86/intel/tpmi/plr: Add support for the plr mailbox
9: platform/x86/intel/tpmi/plr: Fix output in plr_print_bits()
10: platform/x86/intel: power-domains: Add Clearwater Forest support
11: platform/x86/intel: power-domains: Use topology_logical_package_id() for package ID