Skip to content

Switch everything to use an SD card based rootfs#2

Merged
arm000 merged 2 commits intoarm000:linux-next/mboxicfrom
alexhorner:irq-forward
Jan 15, 2023
Merged

Switch everything to use an SD card based rootfs#2
arm000 merged 2 commits intoarm000:linux-next/mboxicfrom
alexhorner:irq-forward

Conversation

@alexhorner
Copy link

No description provided.

@arm000 arm000 merged commit 0e5f19a into arm000:linux-next/mboxic Jan 15, 2023
arm000 pushed a commit that referenced this pull request Jan 21, 2023
Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faa ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
arm000 pushed a commit that referenced this pull request Jan 21, 2023
Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()".

Playing with virtio-mem and background snapshots (using uffd-wp) on
hugetlb in QEMU, I managed to trigger a VM_BUG_ON().  Looking into the
details, hugetlb_change_protection() seems to not handle uffd-wp correctly
in all cases.

Patch #1 fixes my test case.  I don't have reproducers for patch #2, as it
requires running into migration entries.

I did not yet check in detail yet if !hugetlb code requires similar care.


This patch (of 2):

There are two problematic cases when stumbling over a PTE marker in
hugetlb_change_protection():

(1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will
    end up in the "!huge_pte_none(pte)" case and mess up the PTE marker.

(2) We unprotect a uffd-wp PTE marker: we will similarly end up in the
    "!huge_pte_none(pte)" case even though we cleared the PTE, because
    the "pte" variable is stale. We'll mess up the PTE marker.

For example, if we later stumble over such a "wrongly modified" PTE marker,
we'll treat it like a present PTE that maps some garbage page.

This can, for example, be triggered by mapping a memfd backed by huge
pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a)
uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c)
unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE)
on that file range, we will run into a VM_BUG_ON:

[  195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
[  195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff)
[  195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000
[  195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page))
[  195.039573] ------------[ cut here ]------------
[  195.039574] kernel BUG at mm/rmap.c:1346!
[  195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1
[  195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022
[  195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550
[  195.039588] Code: [...]
[  195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292
[  195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000
[  195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff
[  195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08
[  195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0
[  195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100
[  195.039595] FS:  00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000
[  195.039596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0
[  195.039598] PKRU: 55555554
[  195.039599] Call Trace:
[  195.039600]  <TASK>
[  195.039602]  __unmap_hugepage_range+0x33b/0x7d0
[  195.039605]  unmap_hugepage_range+0x55/0x70
[  195.039608]  hugetlb_vmdelete_list+0x77/0xa0
[  195.039611]  hugetlbfs_fallocate+0x410/0x550
[  195.039612]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[  195.039616]  vfs_fallocate+0x12e/0x360
[  195.039618]  __x64_sys_fallocate+0x40/0x70
[  195.039620]  do_syscall_64+0x58/0x80
[  195.039623]  ? syscall_exit_to_user_mode+0x17/0x40
[  195.039624]  ? do_syscall_64+0x67/0x80
[  195.039626]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  195.039628] RIP: 0033:0x7fc7b590651f
[  195.039653] Code: [...]
[  195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[  195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f
[  195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c
[  195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073
[  195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000
[  195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000
[  195.039661]  </TASK>

Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over
an exclusive marker.  spin_unlock() + continue would get the job done.

However, instead, make it clearer that there are no fall-through
statements: we process each case (hwpoison, migration, marker, !none,
none) and then unlock the page table to continue with the next PTE.  Let's
avoid "continue" statements and use a single spin_unlock() at the end.

Link: https://lkml.kernel.org/r/20221222205511.675832-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221222205511.675832-2-david@redhat.com
Fixes: 60dfaad ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arm000 pushed a commit that referenced this pull request Jan 25, 2023
This lockdep splat says it better than I could:

================================
WARNING: inconsistent lock state
6.2.0-rc2-07010-ga9b9500ffaac-dirty #967 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0
{IN-SOFTIRQ-W} state was registered at:
  _raw_spin_lock+0x5c/0xc0
  sch_direct_xmit+0x148/0x37c
  __dev_queue_xmit+0x528/0x111c
  ip6_finish_output2+0x5ec/0xb7c
  ip6_finish_output+0x240/0x3f0
  ip6_output+0x78/0x360
  ndisc_send_skb+0x33c/0x85c
  ndisc_send_rs+0x54/0x12c
  addrconf_rs_timer+0x154/0x260
  call_timer_fn+0xb8/0x3a0
  __run_timers.part.0+0x214/0x26c
  run_timer_softirq+0x3c/0x74
  __do_softirq+0x14c/0x5d8
  ____do_softirq+0x10/0x20
  call_on_irq_stack+0x2c/0x5c
  do_softirq_own_stack+0x1c/0x30
  __irq_exit_rcu+0x168/0x1a0
  irq_exit_rcu+0x10/0x40
  el1_interrupt+0x38/0x64
irq event stamp: 7825
hardirqs last  enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130
hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8
softirqs last  enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8
softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(_xmit_ETHER#2);
  <Interrupt>
    lock(_xmit_ETHER#2);

 *** DEADLOCK ***

3 locks held by kworker/1:3/179:
 #0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
 #1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
 #2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34

Workqueue: events enetc_tx_onestep_tstamp
Call trace:
 print_usage_bug.part.0+0x208/0x22c
 mark_lock+0x7f0/0x8b0
 __lock_acquire+0x7c4/0x1ce0
 lock_acquire.part.0+0xe0/0x220
 lock_acquire+0x68/0x84
 _raw_spin_lock+0x5c/0xc0
 netif_freeze_queues+0x5c/0xc0
 netif_tx_lock+0x24/0x34
 enetc_tx_onestep_tstamp+0x20/0x100
 process_one_work+0x28c/0x6c0
 worker_thread+0x74/0x450
 kthread+0x118/0x11c

but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in
process context, therefore with softirqs enabled (i.o.w., it can be
interrupted by a softirq). If we hold the netif_tx_lock() when there is
an interrupt, and the NET_TX softirq then gets scheduled, this will take
the netif_tx_lock() a second time and deadlock the kernel.

To solve this, use netif_tx_lock_bh(), which blocks softirqs from
running.

Fixes: 7294380 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230112105440.1786799-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
arm000 pushed a commit that referenced this pull request Jan 25, 2023
This fixes the following trace caused by attempting to lock
cmd_sync_work_lock while holding the rcu_read_lock:

kworker/u3:2/212 is trying to lock:
ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at:
hci_cmd_sync_queue+0xad/0x140
other info that might help us debug this:
context-{4:4}
4 locks held by kworker/u3:2/212:
 #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
 process_one_work+0x4dc/0x9a0
 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
 at: process_one_work+0x4dc/0x9a0
 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at:
 hci_cc_le_set_cig_params+0x64/0x4f0
 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at:
 hci_cc_le_set_cig_params+0x2f9/0x4f0

Fixes: 26afbd8 ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
arm000 pushed a commit that referenced this pull request Jan 25, 2023
This attempts to fix the following trace:

iso-tester/52 is trying to acquire lock:
ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at:
iso_sock_listen+0x29e/0x440

but task is already holding lock:
ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
iso_sock_listen+0x8b/0x440

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
       lock_acquire+0x176/0x3d0
       lock_sock_nested+0x32/0x80
       iso_connect_cfm+0x1a3/0x630
       hci_cc_le_setup_iso_path+0x195/0x340
       hci_cmd_complete_evt+0x1ae/0x500
       hci_event_packet+0x38e/0x7c0
       hci_rx_work+0x34c/0x980
       process_one_work+0x5a5/0x9a0
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x22/0x30

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       lock_acquire+0x176/0x3d0
       __mutex_lock+0x13b/0xf50
       hci_le_remote_feat_complete_evt+0x17e/0x320
       hci_event_packet+0x38e/0x7c0
       hci_rx_work+0x34c/0x980
       process_one_work+0x5a5/0x9a0
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x22/0x30

-> #0 (&hdev->lock){+.+.}-{3:3}:
       check_prev_add+0xfc/0x1190
       __lock_acquire+0x1e27/0x2750
       lock_acquire+0x176/0x3d0
       __mutex_lock+0x13b/0xf50
       iso_sock_listen+0x29e/0x440
       __sys_listen+0xe6/0x160
       __x64_sys_listen+0x25/0x30
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x62/0xcc

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by iso-tester/52:
 #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
 iso_sock_listen+0x8b/0x440

Fixes: f764a6c ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
arm000 pushed a commit that referenced this pull request Jan 25, 2023
The cited commit changed class of tc_ht internal mutex in order to avoid
false lock dependency with fs_core node and flow_table hash table
structures. However, hash table implementation internally also includes a
workqueue task with its own lockdep map which causes similar bogus lockdep
splat[0]. Fix it by also adding dedicated class for hash table workqueue
work structure of tc_ht.

[0]:

[ 1139.672465] ======================================================
[ 1139.673552] WARNING: possible circular locking dependency detected
[ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted
[ 1139.675734] ------------------------------------------------------
[ 1139.676801] modprobe/5998 is trying to acquire lock:
[ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.679662]
               but task is already holding lock:
[ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.682223]
               which lock already depends on the new lock.

[ 1139.683640]
               the existing dependency chain (in reverse order) is:
[ 1139.684887]
               -> #2 (&tc_ht_lock_key){+.+.}-{3:3}:
[ 1139.685975]        __mutex_lock+0x12c/0x14b0
[ 1139.686659]        rht_deferred_worker+0x35/0x1540
[ 1139.687405]        process_one_work+0x7c2/0x1310
[ 1139.688134]        worker_thread+0x59d/0xec0
[ 1139.688820]        kthread+0x28f/0x330
[ 1139.689444]        ret_from_fork+0x1f/0x30
[ 1139.690106]
               -> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}:
[ 1139.691250]        __flush_work+0xe8/0x900
[ 1139.691915]        __cancel_work_timer+0x2ca/0x3f0
[ 1139.692655]        rhashtable_free_and_destroy+0x22/0x6f0
[ 1139.693472]        del_sw_flow_table+0x22/0xb0 [mlx5_core]
[ 1139.694592]        tree_put_node+0x24c/0x450 [mlx5_core]
[ 1139.695686]        tree_remove_node+0x6e/0x100 [mlx5_core]
[ 1139.696803]        mlx5_destroy_flow_table+0x187/0x690 [mlx5_core]
[ 1139.698017]        mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core]
[ 1139.699217]        mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core]
[ 1139.700397]        mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core]
[ 1139.701571]        mlx5e_suspend+0xdb/0x140 [mlx5_core]
[ 1139.702665]        mlx5e_remove+0x89/0x190 [mlx5_core]
[ 1139.703756]        auxiliary_bus_remove+0x52/0x70
[ 1139.704492]        device_release_driver_internal+0x3c1/0x600
[ 1139.705360]        bus_remove_device+0x2a5/0x560
[ 1139.706080]        device_del+0x492/0xb80
[ 1139.706724]        mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core]
[ 1139.707961]        mlx5_unregister_device+0x7a/0xa0 [mlx5_core]
[ 1139.709138]        mlx5_uninit_one+0x5f/0x160 [mlx5_core]
[ 1139.710252]        remove_one+0xd1/0x160 [mlx5_core]
[ 1139.711297]        pci_device_remove+0x96/0x1c0
[ 1139.722721]        device_release_driver_internal+0x3c1/0x600
[ 1139.723590]        unbind_store+0x1b1/0x200
[ 1139.724259]        kernfs_fop_write_iter+0x348/0x520
[ 1139.725019]        vfs_write+0x7b2/0xbf0
[ 1139.725658]        ksys_write+0xf3/0x1d0
[ 1139.726292]        do_syscall_64+0x3d/0x90
[ 1139.726942]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.727769]
               -> #0 (&node->lock){++++}-{3:3}:
[ 1139.728698]        __lock_acquire+0x2cf5/0x62f0
[ 1139.729415]        lock_acquire+0x1c1/0x540
[ 1139.730076]        down_write+0x8e/0x1f0
[ 1139.730709]        down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.731841]        mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.732982]        __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.734207]        mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.735491]        mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.736716]        mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.738007]        mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.739213]        mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.740377]        _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.741534]        rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.742351]        mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.743512]        mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.744683]        mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.745860]        mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.747098]        mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.748372]        mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.749590]        __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.750813]        mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.752147]        mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.753293]        auxiliary_bus_remove+0x52/0x70
[ 1139.754028]        device_release_driver_internal+0x3c1/0x600
[ 1139.754885]        driver_detach+0xc1/0x180
[ 1139.755553]        bus_remove_driver+0xef/0x2e0
[ 1139.756260]        auxiliary_driver_unregister+0x16/0x50
[ 1139.757059]        mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.758207]        mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.759295]        mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.760384]        __x64_sys_delete_module+0x2b5/0x450
[ 1139.761166]        do_syscall_64+0x3d/0x90
[ 1139.761827]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.762663]
               other info that might help us debug this:

[ 1139.763925] Chain exists of:
                 &node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key

[ 1139.765743]  Possible unsafe locking scenario:

[ 1139.766688]        CPU0                    CPU1
[ 1139.767399]        ----                    ----
[ 1139.768111]   lock(&tc_ht_lock_key);
[ 1139.768704]                                lock((work_completion)(&ht->run_work));
[ 1139.769869]                                lock(&tc_ht_lock_key);
[ 1139.770770]   lock(&node->lock);
[ 1139.771326]
                *** DEADLOCK ***

[ 1139.772345] 2 locks held by modprobe/5998:
[ 1139.772994]  #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
[ 1139.774399]  #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.775822]
               stack backtrace:
[ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1
[ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1139.779529] Call Trace:
[ 1139.779992]  <TASK>
[ 1139.780409]  dump_stack_lvl+0x57/0x7d
[ 1139.781015]  check_noncircular+0x278/0x300
[ 1139.781687]  ? print_circular_bug+0x460/0x460
[ 1139.782381]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1139.783121]  ? lock_release+0x487/0x7c0
[ 1139.783759]  ? orc_find.part.0+0x1f1/0x330
[ 1139.784423]  ? mark_lock.part.0+0xef/0x2fc0
[ 1139.785091]  __lock_acquire+0x2cf5/0x62f0
[ 1139.785754]  ? register_lock_class+0x18e0/0x18e0
[ 1139.786483]  lock_acquire+0x1c1/0x540
[ 1139.787093]  ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.788195]  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
[ 1139.788978]  ? register_lock_class+0x18e0/0x18e0
[ 1139.789715]  down_write+0x8e/0x1f0
[ 1139.790292]  ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.791380]  ? down_write_killable+0x220/0x220
[ 1139.792080]  ? find_held_lock+0x2d/0x110
[ 1139.792713]  down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.793795]  mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.794879]  __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.796032]  ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core]
[ 1139.797227]  ? xa_load+0x11a/0x200
[ 1139.797800]  ? __xa_clear_mark+0xf0/0xf0
[ 1139.798438]  mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.799660]  mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.800821]  mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.802049]  ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core]
[ 1139.803260]  mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.804398]  ? __cancel_work_timer+0x1c2/0x3f0
[ 1139.805099]  ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core]
[ 1139.806387]  mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.807481]  _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.808564]  rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.809336]  ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.809336]  ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.810455]  mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.811552]  mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.812655]  mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.813768]  mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.814952]  mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.816166]  mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.817336]  __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.818507]  mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.819788]  ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core]
[ 1139.821051]  ? kernfs_find_ns+0x137/0x310
[ 1139.821705]  mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.822778]  auxiliary_bus_remove+0x52/0x70
[ 1139.823449]  device_release_driver_internal+0x3c1/0x600
[ 1139.824240]  driver_detach+0xc1/0x180
[ 1139.824842]  bus_remove_driver+0xef/0x2e0
[ 1139.825504]  auxiliary_driver_unregister+0x16/0x50
[ 1139.826245]  mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.827322]  mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.828345]  mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.829382]  __x64_sys_delete_module+0x2b5/0x450
[ 1139.830119]  ? module_flags+0x300/0x300
[ 1139.830750]  ? task_work_func_match+0x50/0x50
[ 1139.831440]  ? task_work_cancel+0x20/0x20
[ 1139.832088]  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
[ 1139.832873]  ? syscall_enter_from_user_mode+0x1d/0x50
[ 1139.833661]  ? trace_hardirqs_on+0x2d/0x100
[ 1139.834328]  do_syscall_64+0x3d/0x90
[ 1139.834922]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.835700] RIP: 0033:0x7f153e71288b
[ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48
[ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b
[ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288
[ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000
[ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288
[ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8

Fixes: 9ba3333 ("net/mlx5e: Avoid false lock depenency warning on tc_ht")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
arm000 pushed a commit that referenced this pull request Jan 25, 2023
The commit 4af1b64 ("octeontx2-pf: Fix lmtst ID used in aura
free") uses the get/put_cpu() to protect the usage of percpu pointer
in ->aura_freeptr() callback, but it also unnecessarily disable the
preemption for the blockable memory allocation. The commit 87b93b6
("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
fix these sleep inside atomic warnings. But it only fix the one for
the non-rt kernel. For the rt kernel, we still get the similar warnings
like below.
  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  3 locks held by swapper/0/1:
   #0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
   #1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
   #2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
  Preemption disabled at:
  [<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
  CPU: 20 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc3-rt1-yocto-preempt-rt #1
  Hardware name: Marvell OcteonTX CN96XX board (DT)
  Call trace:
   dump_backtrace.part.0+0xe8/0xf4
   show_stack+0x20/0x30
   dump_stack_lvl+0x9c/0xd8
   dump_stack+0x18/0x34
   __might_resched+0x188/0x224
   rt_spin_lock+0x64/0x110
   alloc_iova_fast+0x1ac/0x2ac
   iommu_dma_alloc_iova+0xd4/0x110
   __iommu_dma_map+0x80/0x144
   iommu_dma_map_page+0xe8/0x260
   dma_map_page_attrs+0xb4/0xc0
   __otx2_alloc_rbuf+0x90/0x150
   otx2_rq_aura_pool_init+0x1c8/0x284
   otx2_init_hw_resources+0xe4/0x3a4
   otx2_open+0xf0/0x610
   __dev_open+0x104/0x224
   __dev_change_flags+0x1e4/0x274
   dev_change_flags+0x2c/0x7c
   ic_open_devs+0x124/0x2f8
   ip_auto_config+0x180/0x42c
   do_one_initcall+0x90/0x4dc
   do_basic_setup+0x10c/0x14c
   kernel_init_freeable+0x10c/0x13c
   kernel_init+0x2c/0x140
   ret_from_fork+0x10/0x20

Of course, we can shuffle the get/put_cpu() to only wrap the invocation
of ->aura_freeptr() as what commit 87b93b6 does. But there are only
two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
cn10k_aura_freeptr(). There is no usage of perpcu variable in the
otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
We can move the get/put_cpu() into the corresponding callback which
really has the percpu variable usage and avoid the sprinkling of
get/put_cpu() in several places.

Fixes: 4af1b64 ("octeontx2-pf: Fix lmtst ID used in aura free")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
arm000 pushed a commit that referenced this pull request Jan 30, 2023
…kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.2, take #2

- Pass the correct address to mte_clear_page_tags() on initialising
  a tagged page

- Plug a race against a GICv4.1 doorbell interrupt while saving
  the vgic-v3 pending state.
arm000 pushed a commit that referenced this pull request Feb 5, 2023
Jakub Sitnicki says:

====================

This patch set addresses the syzbot report in [1].

Patch #1 has been suggested by Eric [2]. I extended it to cover the rest of
sock_map proto callbacks. Otherwise we would still overflow the stack.

Patch #2 contains the actual fix and bug analysis.
Patches #3 & #4 add coverage to selftests to trigger the bug.

[1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/
[2] https://lore.kernel.org/all/CANn89iK2UN1FmdUcH12fv_xiZkv2G+Nskvmq7fG6aA_6VKRf6g@mail.gmail.com/
---
v1 -> v2:
v1: https://lore.kernel.org/r/20230113-sockmap-fix-v1-0-d3cad092ee10@cloudflare.com
[v1 didn't hit bpf@ ML by mistake]

 * pull in Eric's patch to protect against recursion loop bugs (Eric)
 * add a macro helper to check if pointer is inside a memory range (Eric)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fishwaldo pushed a commit to Fishwaldo/linux-bl808 that referenced this pull request Mar 19, 2023
…eues().

Christoph Paasch reported that commit b5fc292 ("inet6: Remove
inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering
WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues().  [0 - 2]
Also, we can reproduce it by a program in [3].

In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy()
to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in
inet_csk_destroy_sock().

The same check has been in inet_sock_destruct() from at least v2.6,
we can just remove the WARN_ON_ONCE().  However, among the users of
sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct().
Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().

[0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/
[1]: multipath-tcp/mptcp_net-next#341
[2]:
WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0
Modules linked in:
CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 arm000#2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9
RSP: 0018:ffff88810570fc68 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005
RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488
R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458
FS:  00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 inet_csk_destroy_sock+0x1a1/0x320
 __tcp_close+0xab6/0xe90
 tcp_close+0x30/0xc0
 inet_release+0xe9/0x1f0
 inet6_release+0x4c/0x70
 __sock_release+0xd2/0x280
 sock_close+0x15/0x20
 __fput+0x252/0xa20
 task_work_run+0x169/0x250
 exit_to_user_mode_prepare+0x113/0x120
 syscall_exit_to_user_mode+0x1d/0x40
 do_syscall_64+0x48/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f7fdf7ae28d
Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d
RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f
R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000
R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8
 </TASK>

[3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/

Fixes: b5fc292 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Christoph Paasch <christophpaasch@icloud.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments