Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Feb 9, 2014

No description provided.

cwimmer-altiscale added a commit that referenced this pull request Feb 9, 2014
README: fix instructions for cloning altiscale repo
@cwimmer-altiscale cwimmer-altiscale merged commit d07b763 into master Feb 9, 2014
@cwimmer-altiscale cwimmer-altiscale deleted the dc_fix_readme branch February 9, 2014 18:57
ghost pushed a commit that referenced this pull request Feb 9, 2014
commit 66efdc7 upstream.

snd_seq_timer_open() didn't catch the whole error path but let through
if the timer id is a slave.  This may lead to Oops by accessing the
uninitialized pointer.

 BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae
 IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 PGD 785cd067 PUD 76964067 PMD 0
 Oops: 0002 [#4] SMP
 CPU 0
 Pid: 4288, comm: trinity-child7 Tainted: G      D W 3.9.0-rc1+ #100 Bochs Bochs
 RIP: 0010:[<ffffffff819b3477>]  [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 RSP: 0018:ffff88006ece7d38  EFLAGS: 00010246
 RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000
 RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38
 RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007
 FS:  00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290)
 Stack:
  0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d
  65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000
  ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520
 Call Trace:
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70
  [<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120
  [<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0
  [<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20
  [<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570
  [<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0
  [<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81faed69>] system_call_fastpath+0x16/0x1b

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
ghost pushed a commit that referenced this pull request Feb 9, 2014
commit 1a7bbda upstream.

We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.

This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:

kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270

.. snip..
Call Trace:
 [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
 [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
  hrtimer locks -> timekeeping locks

clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.

But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.

Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:

[  251.100221] ======================================================
[  251.100221] [ INFO: possible circular locking dependency detected ]
[  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[  251.101967] -------------------------------------------------------
[  251.101967] kworker/10:1/4506 is trying to acquire lock:
[  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[  251.101967]
[  251.101967] but task is already holding lock:
[  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] which lock already depends on the new lock.
[  251.101967]
[  251.101967]
[  251.101967] the existing dependency chain (in reverse order) is:
[  251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
[  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[  251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[  251.101967] other info that might help us debug this:
[  251.101967]
[  251.101967] Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11

[  251.101967]  Possible unsafe locking scenario:
[  251.101967]
[  251.101967]        CPU0                    CPU1
[  251.101967]        ----                    ----
[  251.101967]   lock(hrtimer_bases.lock#11);
[  251.101967]                                lock(&rt_b->rt_runtime_lock);
[  251.101967]                                lock(hrtimer_bases.lock#11);
[  251.101967]   lock(timekeeper_seq);
[  251.101967]
[  251.101967]  *** DEADLOCK ***
[  251.101967]
[  251.101967] 3 locks held by kworker/10:1/4506:
[  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] stack backtrace:
[  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[  251.101967] Workqueue: events clock_was_set_work

So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.

This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: stable <stable@vger.kernel.org> #3.10+
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Hayes Wang says:

====================
support new chip

Remove the trailing "/* CRC */" for patch #3.

Change the return value type of rtl_ops_init() from int to boolean
for patch #4.

Replace VENDOR_ID_SAMSUNG with SAMSUNG_VENDOR_ID for patch #6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Use helper functions to simplify _DSM related code in TPM driver.

This patch also help to get rid of following warning messages:
[  163.509575] ACPI Error: Incorrect return type [Buffer] requested [Package]
(20130517/nsxfeval-135)

But there is still an warning left.
[  181.637366] ACPI Warning: \_SB_.IIO0.LPC0.TPM_._DSM: Argument #4 type
mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
When a task enters call_refreshresult with status 0 from call_refresh and
!rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
or max number of retries.

Instead of trying forever, make use of the retry path that other errors use.

This only seems to be possible when the crrefresh callback is gss_refresh_null,
which only happens when destroying the context.

To reproduce:

1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
   operations).

2) reboot - the client will be stuck and will need to be hard rebooted

BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
irq event stamp: 195724
hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
RSP: 0018:ffff880079003d18  EFLAGS: 00000246
RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
Stack:
 ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
Call Trace:
 [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffff81052974>] process_one_work+0x211/0x3a5
 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
 [<ffffffff81052eeb>] worker_thread+0x134/0x202
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff810584a0>] kthread+0xc9/0xd1
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00

And the output of "rpcdebug -m rpc -s all":

RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Jon Maloy says:

====================
tipc: link setup and failover improvements

This series consists of four unrelated commits with different purposes.

- Commit #1 is purely cosmetic and pedagogic, hopefully making the
  failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
  discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
  code.
- Commit #4 removes an unnecessary indirection step during link
  startup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Avoid circular mutex lock by pushing the dev->lock to the .fini
callback on each extension.

As em28xx-dvb, em28xx-alsa and em28xx-rc have their own data
structures, and don't touch at the common structure during .fini,
only em28xx-v4l needs to be locked.

[   90.994317] ======================================================
[   90.994356] [ INFO: possible circular locking dependency detected ]
[   90.994395] 3.13.0-rc1+ #24 Not tainted
[   90.994427] -------------------------------------------------------
[   90.994458] khubd/54 is trying to acquire lock:
[   90.994490]  (&card->controls_rwsem){++++.+}, at: [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.994656]
[   90.994656] but task is already holding lock:
[   90.994688]  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]
[   90.994843]
[   90.994843] which lock already depends on the new lock.
[   90.994843]
[   90.994874]
[   90.994874] the existing dependency chain (in reverse order) is:
[   90.994905]
-> #1 (&dev->lock){+.+.+.}:
[   90.995057]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995121]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995182]        [<ffffffff816a5b6c>] mutex_lock_nested+0x5c/0x3c0
[   90.995245]        [<ffffffffa0422cca>] em28xx_vol_put_mute+0x1ba/0x1d0 [em28xx_alsa]
[   90.995309]        [<ffffffffa017813d>] snd_ctl_elem_write+0xfd/0x140 [snd]
[   90.995376]        [<ffffffffa01791c2>] snd_ctl_ioctl+0xe2/0x810 [snd]
[   90.995442]        [<ffffffff811db8b0>] do_vfs_ioctl+0x300/0x520
[   90.995504]        [<ffffffff811dbb51>] SyS_ioctl+0x81/0xa0
[   90.995568]        [<ffffffff816b1929>] system_call_fastpath+0x16/0x1b
[   90.995630]
-> #0 (&card->controls_rwsem){++++.+}:
[   90.995780]        [<ffffffff810b7a47>] check_prevs_add+0x947/0x950
[   90.995841]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995901]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995962]        [<ffffffff816a762b>] down_write+0x3b/0xa0
[   90.996022]        [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.996088]        [<ffffffffa017a255>] snd_device_free+0x65/0x140 [snd]
[   90.996154]        [<ffffffffa017a751>] snd_device_free_all+0x61/0xa0 [snd]
[   90.996219]        [<ffffffffa0173af4>] snd_card_do_free+0x14/0x130 [snd]
[   90.996283]        [<ffffffffa0173f14>] snd_card_free+0x84/0x90 [snd]
[   90.996349]        [<ffffffffa0423397>] em28xx_audio_fini+0x97/0xb0 [em28xx_alsa]
[   90.996411]        [<ffffffffa040dba6>] em28xx_close_extension+0x56/0x90 [em28xx]
[   90.996475]        [<ffffffffa040f639>] em28xx_usb_disconnect+0x79/0x90 [em28xx]
[   90.996539]        [<ffffffff814a06e7>] usb_unbind_interface+0x67/0x1d0
[   90.996620]        [<ffffffff8142920f>] __device_release_driver+0x7f/0xf0
[   90.996682]        [<ffffffff814292a5>] device_release_driver+0x25/0x40
[   90.996742]        [<ffffffff81428b0c>] bus_remove_device+0x11c/0x1a0
[   90.996801]        [<ffffffff81425536>] device_del+0x136/0x1d0
[   90.996863]        [<ffffffff8149e0c0>] usb_disable_device+0xb0/0x290
[   90.996923]        [<ffffffff814930c5>] usb_disconnect+0xb5/0x1d0
[   90.996984]        [<ffffffff81495ab6>] hub_port_connect_change+0xd6/0xad0
[   90.997044]        [<ffffffff814967c3>] hub_events+0x313/0x9b0
[   90.997105]        [<ffffffff81496e95>] hub_thread+0x35/0x170
[   90.997165]        [<ffffffff8108ea2f>] kthread+0xff/0x120
[   90.997226]        [<ffffffff816b187c>] ret_from_fork+0x7c/0xb0
[   90.997287]
[   90.997287] other info that might help us debug this:
[   90.997287]
[   90.997318]  Possible unsafe locking scenario:
[   90.997318]
[   90.997348]        CPU0                    CPU1
[   90.997378]        ----                    ----
[   90.997408]   lock(&dev->lock);
[   90.997497]                                lock(&card->controls_rwsem);
[   90.997607]                                lock(&dev->lock);
[   90.997697]   lock(&card->controls_rwsem);
[   90.997786]
[   90.997786]  *** DEADLOCK ***
[   90.997786]
[   90.997817] 5 locks held by khubd/54:
[   90.997847]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81496564>] hub_events+0xb4/0x9b0
[   90.998025]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81493076>] usb_disconnect+0x66/0x1d0
[   90.998204]  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8142929d>] device_release_driver+0x1d/0x40
[   90.998383]  #3:  (em28xx_devlist_mutex){+.+.+.}, at: [<ffffffffa040db77>] em28xx_close_extension+0x27/0x90 [em28xx]
[   90.998567]  #4:  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]

Reviewed-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Tested-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Fix broken inline assembly contraints for cmpxchg64 on 32bit.

Fixes this crash:

specification exception: 0006 [#1] SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0 #4
task: 005a16c8 ti: 00592000 task.ti: 00592000
Krnl PSW : 070ce000 8029abd6 (lockref_get+0x3e/0x9c)
...
Krnl Code: 8029abcc: a71a0001           ahi     %r1,1
           8029abd0: 1852               lr      %r5,%r2
          #8029abd2: bb40f064           cds     %r4,%r0,100(%r15)
          >8029abd6: 1943               cr      %r4,%r3
           8029abd8: 1815               lr      %r1,%r5
Call Trace:
([<0000000078e01870>] 0x78e01870)
 [<000000000021105a>] sysfs_mount+0xd2/0x1c8
 [<00000000001b551e>] mount_fs+0x3a/0x134
 [<00000000001ce768>] vfs_kern_mount+0x44/0x11c
 [<00000000001ce864>] kern_mount_data+0x24/0x3c
 [<00000000005cc4b8>] sysfs_init+0x74/0xd4
 [<00000000005cb5b4>] mnt_init+0xe0/0x1fc
 [<00000000005cb16a>] vfs_caches_init+0xb6/0x14c
 [<00000000005be794>] start_kernel+0x318/0x33c
 [<000000000010001c>] _stext+0x1c/0x80

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset.
This leaves a free'd skb on the queue, resulting a double-free later.

Without this, the following oops can occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: af_rxrpc ...
CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ #4
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000
RIP: 0010:[<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
RSP: 0018:ffff880035c93db8  EFLAGS: 00010097
RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084
RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270
R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070
R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720
FS:  00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0
Stack:
 ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8
 ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0
 ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008
Call Trace:
 [<ffffffffa041a5b8>] rxrpc_release+0x128/0x2e0 [af_rxrpc]
 [<ffffffff81546cef>] sock_release+0x1f/0x80
 [<ffffffff81546d62>] sock_close+0x12/0x20
 [<ffffffff811aaba1>] __fput+0xe1/0x230
 [<ffffffff811aad3e>] ____fput+0xe/0x10
 [<ffffffff810862cc>] task_work_run+0xbc/0xe0
 [<ffffffff8106a3be>] do_exit+0x2be/0xa10
 [<ffffffff8116dc47>] ? do_munmap+0x297/0x3b0
 [<ffffffff8106ab8f>] do_group_exit+0x3f/0xa0
 [<ffffffff8106ac04>] SyS_exit_group+0x14/0x20
 [<ffffffff8166b069>] system_call_fastpath+0x16/0x1b


Signed-off-by: Tim Smith <tim@electronghost.co.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,

[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882]  lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G        W  O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872]  [<ffffffff81680fe7>] dump_stack+0x54/0x74
[ 4723.422753]  [<ffffffff81681093>] spin_dump+0x8c/0x91
[ 4723.424979]  [<ffffffff816810b9>] spin_bug+0x21/0x26
[ 4723.425846]  [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
[ 4723.434424]  [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
[ 4723.438747]  [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321]  [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692]  [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336]  [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332]  [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543]  [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833]  [<ffffffff81079efa>] kthread+0xea/0xf0
[ 4723.458990]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.460133]  [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[ 4723.460865]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------

----------------------------------------------------------------------

The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.

So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.

Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Guenter Roeck has got the following call trace on a p2020 board:
  Kernel stack overflow in process eb3e5a00, r1=eb79df90
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  task: eb3e5a00 ti: c0616000 task.ti: ef440000
  NIP: c003a420 LR: c003a410 CTR: c0017518
  REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
  MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
  GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
  GPR08: 00000000 020b8000 00000000 00000000 44008442
  NIP [c003a420] __do_softirq+0x94/0x1ec
  LR [c003a410] __do_softirq+0x84/0x1ec
  Call Trace:
  [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
  [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
  [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
  [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
  [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
  --- Exception: 501 at 0xfcda524
      LR = 0x10024900
  Instruction dump:
  7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
  5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
  Kernel panic - not syncing: kernel stack overflow
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  Call Trace:

The reason is that we have used the wrong register to calculate the
ksp_limit in commit cbc9565 (powerpc: Remove ksp_limit on ppc64).
Just fix it.

As suggested by Benjamin Herrenschmidt, also add the C prototype of the
function in the comment in order to avoid such kind of errors in the
future.

Cc: stable@vger.kernel.org # 3.12
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Currently we're using GFP_KERNEL, however there are some path(s) where we
can hold some spinlocks, specifically bond->curr_slave_lock:

[    4.722916] BUG: sleeping function called from invalid context at mm/slub.c:965
[    4.724438] in_atomic(): 1, irqs_disabled(): 0, pid: 940, name: ifup-eth
[    4.726034] 5 locks held by ifup-eth/940:
...snip...
[    4.734646]  #4:  (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa00badc6>] bond_enslave+0xda6/0xdd0 [bonding]
...snip...
[    4.759081]  [<ffffffffa00b6f11>] bond_change_active_slave+0x191/0x3b0 [bonding]
[    4.760917]  [<ffffffffa00b7227>] bond_select_active_slave+0xf7/0x1d0 [bonding]
[    4.762751]  [<ffffffffa00badce>] bond_enslave+0xdae/0xdd0 [bonding]
...snip...

As it's out of hot path and is a really rare event - change the gfp_t flags
to GFP_ATOMIC to avoid sleeping under spinlock.

v2: convert new notify calls to GFP_ATOMIC.

CC: Thomas Glanzmann <thomas@glanzmann.de>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
Zoltan Kiss says:

====================
xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

v5: Only minor fixes based on Wei's comments

v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.

v7: Small fixes in comments/log messages/error paths, and merging the frag
overflow stats patch into its parent.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
====================

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
[13654.480669] ======================================================
[13654.480905] [ INFO: possible circular locking dependency detected ]
[13654.481003] 3.12.0+ #4 Tainted: G        W  O
[13654.481060] -------------------------------------------------------
[13654.481060] btrfs-transacti/9347 is trying to acquire lock:
[13654.481060]  (&(&root->ordered_extent_lock)->rlock){+.+...}, at: [<ffffffffa02d30a1>] btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060] but task is already holding lock:
[13654.481060]  (&(&fs_info->ordered_root_lock)->rlock){+.+...}, at: [<ffffffffa02d3015>] btrfs_cleanup_transaction+0x1e5/0x570 [btrfs]
[13654.481060] which lock already depends on the new lock.

[13654.481060] the existing dependency chain (in reverse order) is:
[13654.481060] -> #1 (&(&fs_info->ordered_root_lock)->rlock){+.+...}:
[13654.481060]        [<ffffffff810c4103>] lock_acquire+0x93/0x130
[13654.481060]        [<ffffffff81689991>] _raw_spin_lock+0x41/0x50
[13654.481060]        [<ffffffffa02f011b>] __btrfs_add_ordered_extent+0x39b/0x450 [btrfs]
[13654.481060]        [<ffffffffa02f0202>] btrfs_add_ordered_extent+0x32/0x40 [btrfs]
[13654.481060]        [<ffffffffa02df6aa>] run_delalloc_nocow+0x78a/0x9d0 [btrfs]
[13654.481060]        [<ffffffffa02dfc0d>] run_delalloc_range+0x31d/0x390 [btrfs]
[13654.481060]        [<ffffffffa02f7c00>] __extent_writepage+0x310/0x780 [btrfs]
[13654.481060]        [<ffffffffa02f830a>] extent_write_cache_pages.isra.29.constprop.48+0x29a/0x410 [btrfs]
[13654.481060]        [<ffffffffa02f879d>] extent_writepages+0x4d/0x70 [btrfs]
[13654.481060]        [<ffffffffa02d9f68>] btrfs_writepages+0x28/0x30 [btrfs]
[13654.481060]        [<ffffffff8114be91>] do_writepages+0x21/0x50
[13654.481060]        [<ffffffff81140d49>] __filemap_fdatawrite_range+0x59/0x60
[13654.481060]        [<ffffffff81140e13>] filemap_fdatawrite_range+0x13/0x20
[13654.481060]        [<ffffffffa02f1db9>] btrfs_wait_ordered_range+0x49/0x140 [btrfs]
[13654.481060]        [<ffffffffa0318fe2>] __btrfs_write_out_cache+0x682/0x8b0 [btrfs]
[13654.481060]        [<ffffffffa031952d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs]
[13654.481060]        [<ffffffffa02c7083>] btrfs_write_dirty_block_groups+0x593/0x680 [btrfs]
[13654.481060]        [<ffffffffa0345307>] commit_cowonly_roots+0x14b/0x20d [btrfs]
[13654.481060]        [<ffffffffa02d7c1a>] btrfs_commit_transaction+0x43a/0x9d0 [btrfs]
[13654.481060]        [<ffffffffa030061a>] btrfs_create_uuid_tree+0x5a/0x100 [btrfs]
[13654.481060]        [<ffffffffa02d5a8a>] open_ctree+0x21da/0x2210 [btrfs]
[13654.481060]        [<ffffffffa02ab6fe>] btrfs_mount+0x68e/0x870 [btrfs]
[13654.481060]        [<ffffffff811b2409>] mount_fs+0x39/0x1b0
[13654.481060]        [<ffffffff811cd653>] vfs_kern_mount+0x63/0xf0
[13654.481060]        [<ffffffff811cfcce>] do_mount+0x23e/0xa90
[13654.481060]        [<ffffffff811d05a3>] SyS_mount+0x83/0xc0
[13654.481060]        [<ffffffff81692b52>] system_call_fastpath+0x16/0x1b
[13654.481060] -> #0 (&(&root->ordered_extent_lock)->rlock){+.+...}:
[13654.481060]        [<ffffffff810c340a>] __lock_acquire+0x150a/0x1a70
[13654.481060]        [<ffffffff810c4103>] lock_acquire+0x93/0x130
[13654.481060]        [<ffffffff81689991>] _raw_spin_lock+0x41/0x50
[13654.481060]        [<ffffffffa02d30a1>] btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060]        [<ffffffffa02d35ce>] transaction_kthread+0x22e/0x270 [btrfs]
[13654.481060]        [<ffffffff81079efa>] kthread+0xea/0xf0
[13654.481060]        [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[13654.481060] other info that might help us debug this:

[13654.481060]  Possible unsafe locking scenario:

[13654.481060]        CPU0                    CPU1
[13654.481060]        ----                    ----
[13654.481060]   lock(&(&fs_info->ordered_root_lock)->rlock);
[13654.481060]				 lock(&(&root->ordered_extent_lock)->rlock);
[13654.481060]				 lock(&(&fs_info->ordered_root_lock)->rlock);
[13654.481060]   lock(&(&root->ordered_extent_lock)->rlock);
[13654.481060]
 *** DEADLOCK ***
[...]

======================================================

btrfs_destroy_all_ordered_extents()
gets &fs_info->ordered_root_lock __BEFORE__ acquiring &root->ordered_extent_lock,
while btrfs_[add,remove]_ordered_extent()
acquires &fs_info->ordered_root_lock __AFTER__ getting &root->ordered_extent_lock.

This patch fixes the above problem.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
…to next/dt

Merge "mvebu dt changes for v3.15 (incremental #4)" from Jason Cooper:

 - dove
    - add system controller node
    - drop pinctrl PMU reg property _before_ it hits mainline and becomes ABI

 - mvebu
    - XP/370
       - change default PCIe apertures
       - switch GP and DB boards internal registers to 0xf1000000
       - correct RAM size on Matrix board
    - 385
       - correct phy connection type for DB board
       - add RD board

* tag 'mvebu-dt-3.15-4' of git://git.infradead.org/linux-mvebu:
  ARM: dove: drop pinctrl PMU reg property
  ARM: mvebu: add Device Tree for the Armada 385 RD board
  ARM: mvebu: use the correct phy connection mode on Armada 385 DB
  ARM: mvebu: the Armada XP Matrix board has 4 GB
  ARM: mvebu: switch the Armada XP GP to use internal registers at 0xf1000000
  ARM: mvebu: switch the Armada XP DB to use internal registers at 0xf1000000
  ARM: mvebu: change the default PCIe apertures for Armada 370/XP
  ARM: dove: add system controller node

Conflicts:
	arch/arm/boot/dts/Makefile

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
I'm transitioning maintainership of the xHCI driver to my colleague,
Mathias Nyman.  The xHCI driver is in good shape, and it's time for me
to move on to the next shiny thing. :)

There's a few known outstanding bugs that we have plans for how to fix:

1. Clear Halt issue that means some USB scanners fail after one scan
2. TD fragment issue that means USB ethernet scatter-gather doesn't work
3. xHCI command queue issues that cause the driver to die when a USB
   device doesn't respond to a Set Address control transfer when another
   command is outstanding.
4. USB port power off for Haswell-ULT is a complete disaster.

Mathias is putting the finishing touches on a fix for #3, which will
make it much easier to craft a solution for #1.  Dan William has an
ACKed RFC for #4 that may land in 3.16, after much testing.  I'm working
with Mathias to come up with an architectural solution for #2.

I don't foresee very many big features coming down the pipe for USB
(which is part of the reason it's a good time to change now).  SSIC is
mostly a hardware-level change (perhaps with some PHY drivers needed),
USB 3.1 is again mostly a hardware-level change with some software
engineering to communicate the speed increase to the device drivers, add
new device descriptor parsing to lsusb, but definitely nothing as big as
USB 3.0 was.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@intel.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
virt_to_page() is incredibly inefficient when virt-to-phys patching is
enabled.  This is because we end up with this calculation:

  page = &mem_map[asm virt_to_phys(addr) >> 12 - __pv_phys_offset >> 12]

in assembly.  The asm virt_to_phys() is equivalent this this operation:

  addr - PAGE_OFFSET + __pv_phys_offset

and we can see that because this is assembly, the compiler has no chance
to optimise some of that away.  This should reduce down to:

  page = &mem_map[(addr - PAGE_OFFSET) >> 12]

for the common cases.  Permit the compiler to make this optimisation by
giving it more of the information it needs - do this by providing a
virt_to_pfn() macro.

Another issue which makes this more complex is that __pv_phys_offset is
a 64-bit type on all platforms.  This is needlessly wasteful - if we
store the physical offset as a PFN, we can save a lot of work having
to deal with 64-bit values, which sometimes ends up producing incredibly
horrid code:

     a4c:       e3009000        movw    r9, #0
                        a4c: R_ARM_MOVW_ABS_NC  __pv_phys_offset
     a50:       e3409000        movt    r9, #0          ; r9 = &__pv_phys_offset
                        a50: R_ARM_MOVT_ABS     __pv_phys_offset
     a54:       e3002000        movw    r2, #0
                        a54: R_ARM_MOVW_ABS_NC  __pv_phys_offset
     a58:       e3402000        movt    r2, #0          ; r2 = &__pv_phys_offset
                        a58: R_ARM_MOVT_ABS     __pv_phys_offset
     a5c:       e5999004        ldr     r9, [r9, #4]    ; r9 = high word of __pv_phys_offset
     a60:       e3001000        movw    r1, #0
                        a60: R_ARM_MOVW_ABS_NC  mem_map
     a64:       e592c000        ldr     ip, [r2]        ; ip = low word of __pv_phys_offset

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
With EXT4FS_DEBUG ext4_count_free_clusters() will call
ext4_read_block_bitmap() without s_group_info initialized, so we need to
initialize multi-block allocator before.

And dependencies that must be solved, to allow this:
- multi-block allocator needs in group descriptors
- need to install s_op before initializing multi-block allocator,
  because in ext4_mb_init_backend() new inode is created.
- initialize number of group desc blocks (s_gdb_count) otherwise
  number of clusters returned by ext4_free_clusters_after_init() is not correct.
  (see ext4_bg_num_gdb_nometa())

Here is the stack backtrace:

(gdb) bt
 #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
 #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000,
     desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
 #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000,
     block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
 #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000,
     block_group=block_group@entry=0) at balloc.c:489
 #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
 #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>,
     sb=0xffff880079a10000) at super.c:2143
 #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>,
     data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851
     ...

Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
There was a deadlock in monitor mode when we were setting the
channel if the channel was not 1.

======================================================
[ INFO: possible circular locking dependency detected ]
3.14.3 #4 Not tainted
-------------------------------------------------------
iw/3323 is trying to acquire lock:
 (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffa062e2f2>] ieee80211_vif_release_channel+0x42/0xb0 [mac80211]

but task is already holding lock:
 (&local->iflist_mtx){+.+...}, at: [<ffffffffa0609e0a>] ieee80211_set_monitor_channel+0x5a/0x1b0 [mac80211]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&local->iflist_mtx){+.+...}:
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa06225cf>] ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
       [<ffffffffa0518189>] iwl_mvm_recalc_multicast+0x49/0xa0 [iwlmvm]
       [<ffffffffa051822e>] iwl_mvm_configure_filter+0x4e/0x70 [iwlmvm]
       [<ffffffffa05e6d43>] ieee80211_configure_filter+0x153/0x5f0 [mac80211]
       [<ffffffffa05e71f5>] ieee80211_reconfig_filter+0x15/0x20 [mac80211]
       [snip]

-> #1 (&mvm->mutex){+.+.+.}:
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa0517246>] iwl_mvm_add_chanctx+0x56/0xe0 [iwlmvm]
       [<ffffffffa062ca1e>] ieee80211_new_chanctx+0x13e/0x410 [mac80211]
       [<ffffffffa062d953>] ieee80211_vif_use_channel+0x1c3/0x5a0 [mac80211]
       [<ffffffffa06035ab>] ieee80211_add_virtual_monitor+0x1ab/0x6b0 [mac80211]
       [<ffffffffa06052ea>] ieee80211_do_open+0xe6a/0x15a0 [mac80211]
       [<ffffffffa0605a79>] ieee80211_open+0x59/0x60 [mac80211]
       [snip]

-> #0 (&local->chanctx_mtx){+.+.+.}:
       [<ffffffff810d6cb7>] check_prevs_add+0x977/0x980
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa062e2f2>] ieee80211_vif_release_channel+0x42/0xb0 [mac80211]
       [<ffffffffa0609ec3>] ieee80211_set_monitor_channel+0x113/0x1b0 [mac80211]
       [<ffffffffa058fb37>] cfg80211_set_monitor_channel+0x77/0x2b0 [cfg80211]
       [<ffffffffa056e0b2>] __nl80211_set_channel+0x122/0x140 [cfg80211]
       [<ffffffffa0581374>] nl80211_set_wiphy+0x284/0xaf0 [cfg80211]
       [snip]

other info that might help us debug this:

Chain exists of:
  &local->chanctx_mtx --> &mvm->mutex --> &local->iflist_mtx

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&local->iflist_mtx);
                               lock(&mvm->mutex);
                               lock(&local->iflist_mtx);
  lock(&local->chanctx_mtx);

 *** DEADLOCK ***

This deadlock actually occurs:
INFO: task iw:3323 blocked for more than 120 seconds.
      Not tainted 3.14.3 #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
iw              D ffff8800c8afcd80  4192  3323   3322 0x00000000
 ffff880078fdb7e0 0000000000000046 ffff8800c8afcd80 ffff880078fdbfd8
 00000000001d5540 00000000001d5540 ffff8801141b0000 ffff8800c8afcd80
 ffff880078ff9e38 ffff880078ff9e38 ffff880078ff9e40 0000000000000246
Call Trace:
 [<ffffffff817ea841>] schedule_preempt_disabled+0x31/0x80
 [<ffffffff817ebaed>] mutex_lock_nested+0x19d/0x4f0
 [<ffffffffa06225cf>] ? ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa06225cf>] ? ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa052a680>] ? iwl_mvm_power_mac_update_mode+0xc0/0xc0 [iwlmvm]
 [<ffffffffa06225cf>] ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa0529357>] _iwl_mvm_power_update_binding+0x27/0x80 [iwlmvm]
 [<ffffffffa0516eb1>] iwl_mvm_unassign_vif_chanctx+0x81/0xc0 [iwlmvm]
 [<ffffffffa062d3ff>] __ieee80211_vif_release_channel+0xdf/0x470 [mac80211]
 [<ffffffffa062e2fa>] ieee80211_vif_release_channel+0x4a/0xb0 [mac80211]
 [<ffffffffa0609ec3>] ieee80211_set_monitor_channel+0x113/0x1b0 [mac80211]
 [<ffffffffa058fb37>] cfg80211_set_monitor_channel+0x77/0x2b0 [cfg80211]
 [<ffffffffa056e0b2>] __nl80211_set_channel+0x122/0x140 [cfg80211]
 [<ffffffffa0581374>] nl80211_set_wiphy+0x284/0xaf0 [cfg80211]

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=75541

Cc: <stable@vger.kernel.org> [3.13+]
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
After 96d365e ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
abhishek-altiscale pushed a commit that referenced this pull request Jul 9, 2014
commit 71abdc1 upstream.

When kswapd exits, it can end up taking locks that were previously held
by allocating tasks while they waited for reclaim.  Lockdep currently
warns about this:

On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
>  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
>  kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
>  {RECLAIM_FS-ON-W} state was registered at:
>     mark_held_locks+0xb9/0x140
>     lockdep_trace_alloc+0x7a/0xe0
>     kmem_cache_alloc_trace+0x37/0x240
>     flex_array_alloc+0x99/0x1a0
>     cgroup_attach_task+0x63/0x430
>     attach_task_by_pid+0x210/0x280
>     cgroup_procs_write+0x16/0x20
>     cgroup_file_write+0x120/0x2c0
>     vfs_write+0xc0/0x1f0
>     SyS_write+0x4c/0xa0
>     tracesys+0xdd/0xe2
>  irq event stamp: 49
>  hardirqs last  enabled at (49):  _raw_spin_unlock_irqrestore+0x36/0x70
>  hardirqs last disabled at (48):  _raw_spin_lock_irqsave+0x2b/0xa0
>  softirqs last  enabled at (0):  copy_process.part.24+0x627/0x15f0
>  softirqs last disabled at (0):            (null)
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(&sig->group_rwsem);
>    <Interrupt>
>      lock(&sig->group_rwsem);
>
>   *** DEADLOCK ***
>
>  no locks held by kswapd2/1151.
>
>  stack backtrace:
>  CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
>  Call Trace:
>    dump_stack+0x19/0x1b
>    print_usage_bug+0x1f7/0x208
>    mark_lock+0x21d/0x2a0
>    __lock_acquire+0x52a/0xb60
>    lock_acquire+0xa2/0x140
>    down_read+0x51/0xa0
>    exit_signals+0x24/0x130
>    do_exit+0xb5/0xa50
>    kthread+0xdb/0x100
>    ret_from_fork+0x7c/0xb0

This is because the kswapd thread is still marked as a reclaimer at the
time of exit.  But because it is exiting, nobody is actually waiting on
it to make reclaim progress anymore, and it's nothing but a regular
thread at this point.  Be tidy and strip it of all its powers
(PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
before returning from the thread function.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cwimmer-altiscale pushed a commit that referenced this pull request Dec 22, 2014
…ap()

commit aef4885 upstream.

Error report likely result in IO so it is bad idea to do it from
atomic context.

This patch should fix following issue:

BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
5 locks held by kworker/u128:1/137:
 #0:  ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
 #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
 #2:  (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
 #3:  (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
 #4:  (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
Workqueue: writeback bdi_writeback_workfn (flush-1:0)
 0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
 ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
 ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
Call Trace:
 [<ffffffff815c7fdc>] dump_stack+0x51/0x6d
 [<ffffffff8108fb30>] __might_sleep+0xf0/0x100
 [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
 [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
 [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
 [<ffffffff8120fa03>] save_error_info+0x23/0x30
 [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
 [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
 [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
 [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
 [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
 [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
 [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
 [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
 [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
 [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
 [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
 [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
 [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
 [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
 [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
 [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
 [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
 [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
 [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
 [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
 [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
 [<ffffffff811377e3>] do_writepages+0x23/0x40
 [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
 [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
 [<ffffffff811a0664>] wb_writeback+0x174/0x2d0
 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
 [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
 [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
 [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
 [<ffffffff81085c1d>] worker_thread+0x35d/0x460
 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
 [<ffffffff8108a885>] kthread+0xf5/0x100
 [<ffffffff810990e5>] ? local_clock+0x25/0x30
 [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff8108a790>] ? __init_kthread_work

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Sep 23, 2020
[ Upstream commit d26383dcb2b4b8629fde05270b4e3633be9e3d4b ]

The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Oct 29, 2020
[ Upstream commit 71a174b39f10b4b93223d374722aa894b5d8a82e ]

b6da31b2c07c "tty: Fix data race in tty_insert_flip_string_fixed_flag"
puts tty_flip_buffer_push under port->lock introducing the following
possible circular locking dependency:

[30129.876566] ======================================================
[30129.876566] WARNING: possible circular locking dependency detected
[30129.876567] 5.9.0-rc2+ #3 Tainted: G S      W
[30129.876568] ------------------------------------------------------
[30129.876568] sysrq.sh/1222 is trying to acquire lock:
[30129.876569] ffffffff92c39480 (console_owner){....}-{0:0}, at: console_unlock+0x3fe/0xa90

[30129.876572] but task is already holding lock:
[30129.876572] ffff888107cb9018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state.cold.55+0x15b/0x6ca

[30129.876576] which lock already depends on the new lock.

[30129.876577] the existing dependency chain (in reverse order) is:

[30129.876578] -> #3 (&pool->lock/1){-.-.}-{2:2}:
[30129.876581]        _raw_spin_lock+0x30/0x70
[30129.876581]        __queue_work+0x1a3/0x10f0
[30129.876582]        queue_work_on+0x78/0x80
[30129.876582]        pty_write+0x165/0x1e0
[30129.876583]        n_tty_write+0x47f/0xf00
[30129.876583]        tty_write+0x3d6/0x8d0
[30129.876584]        vfs_write+0x1a8/0x650

[30129.876588] -> #2 (&port->lock#2){-.-.}-{2:2}:
[30129.876590]        _raw_spin_lock_irqsave+0x3b/0x80
[30129.876591]        tty_port_tty_get+0x1d/0xb0
[30129.876592]        tty_port_default_wakeup+0xb/0x30
[30129.876592]        serial8250_tx_chars+0x3d6/0x970
[30129.876593]        serial8250_handle_irq.part.12+0x216/0x380
[30129.876593]        serial8250_default_handle_irq+0x82/0xe0
[30129.876594]        serial8250_interrupt+0xdd/0x1b0
[30129.876595]        __handle_irq_event_percpu+0xfc/0x850

[30129.876602] -> #1 (&port->lock){-.-.}-{2:2}:
[30129.876605]        _raw_spin_lock_irqsave+0x3b/0x80
[30129.876605]        serial8250_console_write+0x12d/0x900
[30129.876606]        console_unlock+0x679/0xa90
[30129.876606]        register_console+0x371/0x6e0
[30129.876607]        univ8250_console_init+0x24/0x27
[30129.876607]        console_init+0x2f9/0x45e

[30129.876609] -> #0 (console_owner){....}-{0:0}:
[30129.876611]        __lock_acquire+0x2f70/0x4e90
[30129.876612]        lock_acquire+0x1ac/0xad0
[30129.876612]        console_unlock+0x460/0xa90
[30129.876613]        vprintk_emit+0x130/0x420
[30129.876613]        printk+0x9f/0xc5
[30129.876614]        show_pwq+0x154/0x618
[30129.876615]        show_workqueue_state.cold.55+0x193/0x6ca
[30129.876615]        __handle_sysrq+0x244/0x460
[30129.876616]        write_sysrq_trigger+0x48/0x4a
[30129.876616]        proc_reg_write+0x1a6/0x240
[30129.876617]        vfs_write+0x1a8/0x650

[30129.876619] other info that might help us debug this:

[30129.876620] Chain exists of:
[30129.876621]   console_owner --> &port->lock#2 --> &pool->lock/1

[30129.876625]  Possible unsafe locking scenario:

[30129.876626]        CPU0                    CPU1
[30129.876626]        ----                    ----
[30129.876627]   lock(&pool->lock/1);
[30129.876628]                                lock(&port->lock#2);
[30129.876630]                                lock(&pool->lock/1);
[30129.876631]   lock(console_owner);

[30129.876633]  *** DEADLOCK ***

[30129.876634] 5 locks held by sysrq.sh/1222:
[30129.876634]  #0: ffff8881d3ce0470 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x359/0x650
[30129.876637]  #1: ffffffff92c612c0 (rcu_read_lock){....}-{1:2}, at: __handle_sysrq+0x4d/0x460
[30129.876640]  #2: ffffffff92c612c0 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x5/0xf0
[30129.876642]  #3: ffff888107cb9018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state.cold.55+0x15b/0x6ca
[30129.876645]  #4: ffffffff92c39980 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x123/0x420

[30129.876648] stack backtrace:
[30129.876649] CPU: 3 PID: 1222 Comm: sysrq.sh Tainted: G S      W         5.9.0-rc2+ #3
[30129.876649] Hardware name: Intel Corporation 2012 Client Platform/Emerald Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
[30129.876650] Call Trace:
[30129.876650]  dump_stack+0x9d/0xe0
[30129.876651]  check_noncircular+0x34f/0x410
[30129.876653]  __lock_acquire+0x2f70/0x4e90
[30129.876656]  lock_acquire+0x1ac/0xad0
[30129.876658]  console_unlock+0x460/0xa90
[30129.876660]  vprintk_emit+0x130/0x420
[30129.876660]  printk+0x9f/0xc5
[30129.876661]  show_pwq+0x154/0x618
[30129.876662]  show_workqueue_state.cold.55+0x193/0x6ca
[30129.876664]  __handle_sysrq+0x244/0x460
[30129.876665]  write_sysrq_trigger+0x48/0x4a
[30129.876665]  proc_reg_write+0x1a6/0x240
[30129.876666]  vfs_write+0x1a8/0x650

It looks like the commit was aimed to protect tty_insert_flip_string and
there is no need for tty_flip_buffer_push to be under this lock.

Fixes: b6da31b2c07c ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20200902120045.3693075-1-asavkov@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Nov 19, 2020
[ Upstream commit ca10845a56856fff4de3804c85e6424d0f6d0cde ]

While running btrfs/061, btrfs/073, btrfs/078, or btrfs/178 we hit the
following lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-rc3+ #4 Not tainted
  ------------------------------------------------------
  kswapd0/100 is trying to acquire lock:
  ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

  but task is already holding lock:
  ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (fs_reclaim){+.+.}-{0:0}:
	 fs_reclaim_acquire+0x65/0x80
	 slab_pre_alloc_hook.constprop.0+0x20/0x200
	 kmem_cache_alloc+0x37/0x270
	 alloc_inode+0x82/0xb0
	 iget_locked+0x10d/0x2c0
	 kernfs_get_inode+0x1b/0x130
	 kernfs_get_tree+0x136/0x240
	 sysfs_get_tree+0x16/0x40
	 vfs_get_tree+0x28/0xc0
	 path_mount+0x434/0xc00
	 __x64_sys_mount+0xe3/0x120
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (kernfs_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7e0
	 kernfs_add_one+0x23/0x150
	 kernfs_create_link+0x63/0xa0
	 sysfs_do_create_link_sd+0x5e/0xd0
	 btrfs_sysfs_add_devices_dir+0x81/0x130
	 btrfs_init_new_device+0x67f/0x1250
	 btrfs_ioctl+0x1ef/0x2e20
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7e0
	 btrfs_chunk_alloc+0x125/0x3a0
	 find_free_extent+0xdf6/0x1210
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb0/0x310
	 alloc_tree_block_no_bg_flush+0x4a/0x60
	 __btrfs_cow_block+0x11a/0x530
	 btrfs_cow_block+0x104/0x220
	 btrfs_search_slot+0x52e/0x9d0
	 btrfs_insert_empty_items+0x64/0xb0
	 btrfs_insert_delayed_items+0x90/0x4f0
	 btrfs_commit_inode_delayed_items+0x93/0x140
	 btrfs_log_inode+0x5de/0x2020
	 btrfs_log_inode_parent+0x429/0xc90
	 btrfs_log_new_name+0x95/0x9b
	 btrfs_rename2+0xbb9/0x1800
	 vfs_rename+0x64f/0x9f0
	 do_renameat2+0x320/0x4e0
	 __x64_sys_rename+0x1f/0x30
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 __lock_acquire+0x119c/0x1fc0
	 lock_acquire+0xa7/0x3d0
	 __mutex_lock+0x7e/0x7e0
	 __btrfs_release_delayed_node.part.0+0x3f/0x330
	 btrfs_evict_inode+0x24c/0x500
	 evict+0xcf/0x1f0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x44/0x50
	 super_cache_scan+0x161/0x1e0
	 do_shrink_slab+0x178/0x3c0
	 shrink_slab+0x17c/0x290
	 shrink_node+0x2b2/0x6d0
	 balance_pgdat+0x30a/0x670
	 kswapd+0x213/0x4c0
	 kthread+0x138/0x160
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(fs_reclaim);
				 lock(kernfs_mutex);
				 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/100:
   #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
   #2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

  stack backtrace:
  CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb8
   check_noncircular+0x12d/0x150
   __lock_acquire+0x119c/0x1fc0
   lock_acquire+0xa7/0x3d0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   __mutex_lock+0x7e/0x7e0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? lock_acquire+0xa7/0x3d0
   ? find_held_lock+0x2b/0x80
   __btrfs_release_delayed_node.part.0+0x3f/0x330
   btrfs_evict_inode+0x24c/0x500
   evict+0xcf/0x1f0
   dispose_list+0x48/0x70
   prune_icache_sb+0x44/0x50
   super_cache_scan+0x161/0x1e0
   do_shrink_slab+0x178/0x3c0
   shrink_slab+0x17c/0x290
   shrink_node+0x2b2/0x6d0
   balance_pgdat+0x30a/0x670
   kswapd+0x213/0x4c0
   ? _raw_spin_unlock_irqrestore+0x41/0x50
   ? add_wait_queue_exclusive+0x70/0x70
   ? balance_pgdat+0x670/0x670
   kthread+0x138/0x160
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30

This happens because we are holding the chunk_mutex at the time of
adding in a new device.  However we only need to hold the
device_list_mutex, as we're going to iterate over the fs_devices
devices.  Move the sysfs init stuff outside of the chunk_mutex to get
rid of this lockdep splat.

CC: stable@vger.kernel.org # 4.4.x: f3cd2c58110dad14e: btrfs: sysfs, rename device_link add/remove functions
CC: stable@vger.kernel.org # 4.4.x
Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Nov 24, 2020
commit 6f117cb854a44a79898d844e6ae3fd23bd94e786 upstream.

When requeueing all requests on the device request queue to the blocklayer
we might get to an ERP (error recovery) request that is a copy of an
original CQR.

Those requests do not have blocklayer request information or a pointer to
the dasd_queue set. When trying to access those data it will lead to a
null pointer dereference in dasd_requeue_all_requests().

Fix by checking if the request is an ERP request that can simply be
ignored. The blocklayer request will be requeued by the original CQR that
is on the device queue right behind the ERP request.

Fixes: 9487cfd ("s390/dasd: fix handling of internal requests")
Cc: <stable@vger.kernel.org> #4.16
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Dec 2, 2020
[ Upstream commit e773ca7da8beeca7f17fe4c9d1284a2b66839cc1 ]

Actually, burst size is equal to '1 << desc->rqcfg.brst_size'.
we should use burst size, not desc->rqcfg.brst_size.

dma memcpy performance on Rockchip RV1126
@ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA:

dmatest:

/# echo dma0chan0 > /sys/module/dmatest/parameters/channel
/# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size
/# echo 8 > /sys/module/dmatest/parameters/iterations
/# echo y > /sys/module/dmatest/parameters/norandom
/# echo y > /sys/module/dmatest/parameters/verbose
/# echo 1 > /sys/module/dmatest/parameters/run

dmatest: dma0chan0-copy0: result #1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
dmatest: dma0chan0-copy0: result #8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000

Before:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0)

After this patch:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0)

After this patch and increase dma clk to 400MHz:

  dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0)

Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com>
Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock-chips.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Dec 2, 2020
[ Upstream commit a0faaa27c71608799e0dd765c5af38a089091802 ]

adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset
did not complete after freeing sub crqs. Check for NULL before
dereferencing them.

Snippet of call trace:
ibmvnic 30000006 env6: Releasing sub-CRQ
ibmvnic 30000006 env6: Releasing CRQ
...
ibmvnic 30000006 env6: Got Control IP offload Response
ibmvnic 30000006 env6: Re-setting tx_scrq[0]
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000003dea7cc
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun bridge stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmvnic ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
CPU: 80 PID: 1856 Comm: kworker/80:2 Tainted: G        W         5.8.0+ #4
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c008000003dea7cc LR: c008000003dea7bc CTR: 0000000000000000
REGS: c0000007ef7db860 TRAP: 0380   Tainted: G        W          (5.8.0+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28002422  XER: 0000000d
CFAR: c000000000bd9520 IRQMASK: 0
GPR00: c008000003dea7bc c0000007ef7dbaf0 c008000003df7400 c0000007fa26ec00
GPR04: c0000007fcd0d008 c0000007fcd96350 0000000000000027 c0000007fcd0d010
GPR08: 0000000000000023 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000002000 c00000001ec18e00 c0000000001982f8 c0000007bad6e840
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 fffffffffffffef7
GPR24: 0000000000000402 c0000007fa26f3a8 0000000000000003 c00000016f8ec048
GPR28: 0000000000000000 0000000000000000 0000000000000000 c0000007fa26ec00
NIP [c008000003dea7cc] ibmvnic_reset_init+0x15c/0x258 [ibmvnic]
LR [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic]
Call Trace:
[c0000007ef7dbaf0] [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] (unreliable)
[c0000007ef7dbb80] [c008000003de8860] __ibmvnic_reset+0x408/0x970 [ibmvnic]
[c0000007ef7dbc50] [c00000000018b7cc] process_one_work+0x2cc/0x800
[c0000007ef7dbd20] [c00000000018bd78] worker_thread+0x78/0x520
[c0000007ef7dbdb0] [c0000000001984c4] kthread+0x1d4/0x1e0
[c0000007ef7dbe20] [c00000000000cea8] ret_from_kernel_thread+0x5c/0x74

Fixes: 57a4943 ("ibmvnic: Reset sub-crqs during driver reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Dec 29, 2020
[ Upstream commit 4a9d81caf841cd2c0ae36abec9c2963bf21d0284 ]

If the elem is deleted during be iterated on it, the iteration
process will fall into an endless loop.

kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137]

PID: 17137  TASK: ffff8818d93c0000  CPU: 4   COMMAND: "nfsd"
    [exception RIP: __state_in_grace+76]
    RIP: ffffffffc00e817c  RSP: ffff8818d3aefc98  RFLAGS: 00000246
    RAX: ffff881dc0c38298  RBX: ffffffff81b03580  RCX: ffff881dc02c9f50
    RDX: ffff881e3fce8500  RSI: 0000000000000001  RDI: ffffffff81b03580
    RBP: ffff8818d3aefca0   R8: 0000000000000020   R9: ffff8818d3aefd40
    R10: ffff88017fc03800  R11: ffff8818e83933c0  R12: ffff8818d3aefd40
    R13: 0000000000000000  R14: ffff8818e8391068  R15: ffff8818fa6e4000
    CS: 0010  SS: 0018
 #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace]
 #1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd]
 #2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd]
 #3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd]
 #4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd]
 #5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc]
 #6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc]
 #7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd]
 #8 [ffff8818d3aefec8] kthread at ffffffff810b88c1
 #9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607

The troublemake elem:
crash> lock_manager ffff881dc0c38298
struct lock_manager {
  list = {
    next = 0xffff881dc0c38298,
    prev = 0xffff881dc0c38298
  },
  block_opens = false
}

Fixes: c87fb4a ("lockd: NLM grace period shouldn't block NFSv4 opens")
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Feb 4, 2021
commit caab13b4960416b9fee83169a758eb0f31e65109 upstream.

Since at91_soc_init is called unconditionally from atmel_soc_device_init,
we get the following warning on all non AT91 SoCs:
	" AT91: Could not find identification node"

Fix the same by filtering with allowed AT91 SoC list.

Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: stable@vger.kernel.org #4.12+
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201211135846.1334322-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Feb 4, 2021
…st[]

commit 680896556805d3ad3fa47f6002b87b3041a45ac2 upstream.

of_match_node() calls __of_match_node() which loops though the entries of
matches array. It stops when condition:
(matches->name[0] || matches->type[0] || matches->compatible[0]) is
false. Thus, add a null entry at the end of at91_soc_allowed_list[]
array.

Fixes: caab13b49604 ("drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs")
Cc: stable@vger.kernel.org #4.12+
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Mar 4, 2021
[ Upstream commit 7f9942c61fa60eda7cc8e42f04bd25b7d175876e ]

Building with the clang integrated assembler produces a couple of
errors for the s3c24xx fiq support:

  arch/arm/mach-s3c/irq-s3c24xx-fiq.S:52:2: error: instruction 'subne' can not set flags, but 's' suffix specified
    subnes pc, lr, #4 @@ return, still have work to do

  arch/arm/mach-s3c/irq-s3c24xx-fiq.S:64:1: error: invalid symbol redefinition
    s3c24xx_spi_fiq_txrx:

There are apparently two problems: one with extraneous or duplicate
labels, and one with old-style opcode mnemonics. Stefan Agner has
previously fixed other problems like this, but missed this particular
file.

Fixes: bec0806 ("spi_s3c24xx: add FIQ pseudo-DMA support")
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20210204162416.3030114-1-arnd@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Mar 4, 2021
[ Upstream commit c5c97cadd7ed13381cb6b4bef5c841a66938d350 ]

The ubsan reported the following error.  It was because sample's raw
data missed u32 padding at the end.  So it broke the alignment of the
array after it.

The raw data contains an u32 size prefix so the data size should have
an u32 padding after 8-byte aligned data.

27: Sample parsing  :util/synthetic-events.c:1539:4:
  runtime error: store to misaligned address 0x62100006b9bc for type
  '__u64' (aka 'unsigned long long'), which requires 8 byte alignment
0x62100006b9bc: note: pointer points here
  00 00 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
              ^
    #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13
    #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8
    #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9
    #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9
    #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9
    #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4
    #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9
    #7 0x56153264e796 in run_builtin perf.c:312:11
    #8 0x56153264cf03 in handle_internal_command perf.c:364:8
    #9 0x56153264e47d in run_argv perf.c:408:2
    #10 0x56153264c9a9 in main perf.c:538:3
    #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc)
    #12 0x561532596828 in _start ...

SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use
 util/synthetic-events.c:1539:4 in

Fixes: 045f8cd ("perf tests: Add a sample parsing test")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210214091638.519643-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Mar 17, 2021
commit ac0bbf55ed3be75fde1f8907e91ecd2fd589bde3 upstream.

The digital input subdevice supports Comedi asynchronous commands that
read interrupt status information.  This uses 16-bit Comedi samples (of
which only the bottom 8 bits contain status information).  However, the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit variable `unsigned int status`.  On a bigendian
machine, this will copy 2 bytes from the wrong end of the variable.  Fix
it by changing the type of the variable to `unsigned short`.

Fixes: a8c66b6 ("staging: comedi: addi_apci_1500: rewrite the subdevice support functions")
Cc: <stable@vger.kernel.org> #4.0+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20210223143055.257402-3-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Apr 16, 2021
commit 90bd070aae6c4fb5d302f9c4b9c88be60c8197ec upstream.

The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request May 22, 2021
commit 376565b9717c30cd58ad33860fa42697615fa2e4 upstream.

KMSAN complains that the vmci_use_ppn64() == false path in
vmci_dbell_register_notification_bitmap() left upper 32bits of
bitmap_set_msg.bitmap_ppn64 member uninitialized.

  =====================================================
  BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7+ #4
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
  Call Trace:
   dump_stack+0x21c/0x280
   kmsan_report+0xfb/0x1e0
   kmsan_internal_check_memory+0x484/0x520
   kmsan_check_memory+0xd/0x10
   iowrite8_rep+0x86/0x380
   vmci_send_datagram+0x150/0x280
   vmci_dbell_register_notification_bitmap+0x133/0x1e0
   vmci_guest_probe_device+0xcab/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Local variable ----bitmap_set_msg@vmci_dbell_register_notification_bitmap created at:
   vmci_dbell_register_notification_bitmap+0x50/0x1e0
   vmci_dbell_register_notification_bitmap+0x50/0x1e0

  Bytes 28-31 of 32 are uninitialized
  Memory access of size 32 starts at ffff88810098f570
  =====================================================

Fixes: 83e2ec7 ("VMCI: doorbell implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20210402121742.3917-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request May 22, 2021
commit b2192cfeba8481224da0a4ec3b4a7ccd80b1623b upstream.

KMSAN complains that vmci_check_host_caps() left the payload part of
check_msg uninitialized.

  =====================================================
  BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
  CPU: 1 PID: 1 Comm: swapper/0 Tainted: G    B             5.11.0-rc7+ #4
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
  Call Trace:
   dump_stack+0x21c/0x280
   kmsan_report+0xfb/0x1e0
   kmsan_internal_check_memory+0x202/0x520
   kmsan_check_memory+0xd/0x10
   iowrite8_rep+0x86/0x380
   vmci_guest_probe_device+0xf0b/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Uninit was created at:
   kmsan_internal_poison_shadow+0x5c/0xf0
   kmsan_slab_alloc+0x8d/0xe0
   kmem_cache_alloc+0x84f/0xe30
   vmci_guest_probe_device+0xd11/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Bytes 28-31 of 36 are uninitialized
  Memory access of size 36 starts at ffff8881675e5f00
  =====================================================

Fixes: 1f16643 ("VMCI: guest side driver implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20210402121742.3917-2-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Jul 20, 2021
[ Upstream commit 2c484419efc09e7234c667aa72698cb79ba8d8ed ]

lz4 compatible decompressor is simple.  The format is underspecified and
relies on EOF notification to determine when to stop.  Initramfs buffer
format[1] explicitly states that it can have arbitrary number of zero
padding.  Thus when operating without a fill function, be extra careful to
ensure that sizes less than 4, or apperantly empty chunksizes are treated
as EOF.

To test this I have created two cpio initrds, first a normal one,
main.cpio.  And second one with just a single /test-file with content
"second" second.cpio.  Then i compressed both of them with gzip, and with
lz4 -l.  Then I created a padding of 4 bytes (dd if=/dev/zero of=pad4 bs=1
count=4).  To create four testcase initrds:

 1) main.cpio.gzip + extra.cpio.gzip = pad0.gzip
 2) main.cpio.lz4  + extra.cpio.lz4 = pad0.lz4
 3) main.cpio.gzip + pad4 + extra.cpio.gzip = pad4.gzip
 4) main.cpio.lz4  + pad4 + extra.cpio.lz4 = pad4.lz4

The pad4 test-cases replicate the initrd load by grub, as it pads and
aligns every initrd it loads.

All of the above boot, however /test-file was not accessible in the initrd
for the testcase #4, as decoding in lz4 decompressor failed.  Also an
error message printed which usually is harmless.

Whith a patched kernel, all of the above testcases now pass, and
/test-file is accessible.

This fixes lz4 initrd decompress warning on every boot with grub.  And
more importantly this fixes inability to load multiple lz4 compressed
initrds with grub.  This patch has been shipping in Ubuntu kernels since
January 2021.

[1] ./Documentation/driver-api/early-userspace/buffer-format.rst

BugLink: https://bugs.launchpad.net/bugs/1835660
Link: https://lore.kernel.org/lkml/20210114200256.196589-1-xnox@ubuntu.com/ # v0
Link: https://lkml.kernel.org/r/20210513104831.432975-1-dimitri.ledkov@canonical.com
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Rajat Asthana <thisisrast7@gmail.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Aug 26, 2021
commit 438553958ba19296663c6d6583d208dfb6792830 upstream.

The ordering of MSI-X enable in hardware is dysfunctional:

 1) MSI-X is disabled in the control register
 2) Various setup functions
 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing
    the MSI-X table entries
 4) MSI-X is enabled and masked in the control register with the
    comment that enabling is required for some hardware to access
    the MSI-X table

Step #4 obviously contradicts #3. The history of this is an issue with the
NIU hardware. When #4 was introduced the table access actually happened in
msix_program_entries() which was invoked after enabling and masking MSI-X.

This was changed in commit d71d643 ("PCI/MSI: Kill redundant call of
irq_set_msi_desc() for MSI-X interrupts") which removed the table write
from msix_program_entries().

Interestingly enough nobody noticed and either NIU still works or it did
not get any testing with a kernel 3.19 or later.

Nevertheless this is inconsistent and there is no reason why MSI-X can't be
enabled and masked in the control register early on, i.e. move step #4
above to step #1. This preserves the NIU workaround and has no side effects
on other hardware.

Fixes: d71d643 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Dec 22, 2021
commit f9b58e8c7d031b0daa5c9a9ee27f5a4028ba53ac upstream.

While in theory multiple unwinders could be compiled in, it does
not make sense in practise. Use a choice to make the unwinder
selection mutually exclusive and mandatory.

Already before this commit it has not been possible to deselect
FRAME_POINTER. Remove the obsolete comment.

Furthermore, to produce a meaningful backtrace with FRAME_POINTER
enabled the kernel needs a specific function prologue:
    mov    ip, sp
    stmfd    sp!, {fp, ip, lr, pc}
    sub    fp, ip, #4

To get to the required prologue gcc uses apcs and no-sched-prolog.
This compiler options are not available on clang, and clang is not
able to generate the required prologue. Make the FRAME_POINTER
config symbol depending on !clang.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Feb 23, 2022
commit b7d6f44a0fa716a82969725516dc0b16bc7cd514 upstream.

Rolf Eike Beer reported the following bug:

[1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
[1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
[1274934.746891] Hardware name: 9000/785/C8000
[1274934.746891]
[1274934.746891]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
[1274934.746891] r00-03  000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
[1274934.746891] r04-07  0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
[1274934.746891] r08-11  0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
[1274934.746891] r12-15  0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
[1274934.746891] r16-19  0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
[1274934.746891] r20-23  0000000000000006 000000004a368950 0000000000000000 0000000000000001
[1274934.746891] r24-27  0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
[1274934.746891] r28-31  0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
[1274934.746891] sr00-03  00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
[1274934.746891] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1274934.746891]
[1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
[1274934.746891]  IIR: 48780030    ISR: 0000000000000000  IOR: 0000004140000018
[1274934.746891]  CPU:        3   CR30: 00000040e3a9c000 CR31: ffffffffffffffff
[1274934.746891]  ORIG_R28: 0000000040acdd58
[1274934.746891]  IAOQ[0]: sba_unmap_sg+0xb0/0x118
[1274934.746891]  IAOQ[1]: sba_unmap_sg+0xb4/0x118
[1274934.746891]  RP(r2): sba_unmap_sg+0xac/0x118
[1274934.746891] Backtrace:
[1274934.746891]  [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
[1274934.746891]  [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
[1274934.746891]  [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
[1274934.746891]  [<0000000040798600>] mpt_interrupt+0x168/0xa68
[1274934.746891]  [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
[1274934.746891]  [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
[1274934.746891]  [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
[1274934.746891]  [<00000000402548e0>] generic_handle_irq+0x50/0x70
[1274934.746891]  [<000000004019a254>] call_on_stack+0x18/0x24
[1274934.746891]
[1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?)

The bug is caused by overrunning the sglist and incorrectly testing
sg_dma_len(sglist) before nents. Normally this doesn't cause a crash,
but in this case sglist crossed a page boundary. This occurs in the
following code:

	while (sg_dma_len(sglist) && nents--) {

The fix is simply to test nents first and move the decrement of nents
into the loop.

Reported-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Mar 9, 2022
[ Upstream commit 3d6cc9898efdfb062efb74dc18cfc700e082f5d5 ]

When cifs_get_root() fails during cifs_smb3_do_mount() we call
deactivate_locked_super() which eventually will call delayed_free() which
will free the context.
In this situation we should not proceed to enter the out: section in
cifs_smb3_do_mount() and free the same resources a second time.

[Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0

[Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE     5.17.0-rc3+ #4
[Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
[Thu Feb 10 12:59:06 2022] Call Trace:
[Thu Feb 10 12:59:06 2022]  <IRQ>
[Thu Feb 10 12:59:06 2022]  dump_stack_lvl+0x5d/0x78
[Thu Feb 10 12:59:06 2022]  print_address_description.constprop.0+0x24/0x150
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  kasan_report.cold+0x7d/0x117
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  __asan_load8+0x86/0xa0
[Thu Feb 10 12:59:06 2022]  rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  rcu_core+0x547/0xca0
[Thu Feb 10 12:59:06 2022]  ? call_rcu+0x3c0/0x3c0
[Thu Feb 10 12:59:06 2022]  ? __this_cpu_preempt_check+0x13/0x20
[Thu Feb 10 12:59:06 2022]  ? lock_is_held_type+0xea/0x140
[Thu Feb 10 12:59:06 2022]  rcu_core_si+0xe/0x10
[Thu Feb 10 12:59:06 2022]  __do_softirq+0x1d4/0x67b
[Thu Feb 10 12:59:06 2022]  __irq_exit_rcu+0x100/0x150
[Thu Feb 10 12:59:06 2022]  irq_exit_rcu+0xe/0x30
[Thu Feb 10 12:59:06 2022]  sysvec_hyperv_stimer0+0x9d/0xc0
...
[Thu Feb 10 12:59:07 2022] Freed by task 58179:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  kasan_set_track+0x25/0x30
[Thu Feb 10 12:59:07 2022]  kasan_set_free_info+0x24/0x40
[Thu Feb 10 12:59:07 2022]  ____kasan_slab_free+0x137/0x170
[Thu Feb 10 12:59:07 2022]  __kasan_slab_free+0x12/0x20
[Thu Feb 10 12:59:07 2022]  slab_free_freelist_hook+0xb3/0x1d0
[Thu Feb 10 12:59:07 2022]  kfree+0xcd/0x520
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0x149/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thu Feb 10 12:59:07 2022] Last potentially related work creation:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  __kasan_record_aux_stack+0xb6/0xc0
[Thu Feb 10 12:59:07 2022]  kasan_record_aux_stack_noalloc+0xb/0x10
[Thu Feb 10 12:59:07 2022]  call_rcu+0x76/0x3c0
[Thu Feb 10 12:59:07 2022]  cifs_umount+0xce/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  cifs_kill_sb+0xc8/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  deactivate_locked_super+0x5d/0xd0
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0xab9/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request May 12, 2022
[ Upstream commit af68656d66eda219b7f55ce8313a1da0312c79e1 ]

While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload

Uninterruptible tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.

Fixes: 7fa6f34 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Jun 14, 2022
[ Upstream commit 1b0e81416a24d6e9b8c2341e22e8bf48f8b8bfc9 ]

Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
assumes the device_is_registered() in function nfc_dev_up() will help
to check when the rfkill is unregistered. However, this check only
take effect when device_del(&dev->dev) is done in nfc_unregister_device().
Hence, the rfkill object is still possible be dereferenced.

The crash trace in latest kernel (5.18-rc2):

[   68.760105] ==================================================================
[   68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750
[   68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313
[   68.760756]
[   68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4
[   68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   68.760756] Call Trace:
[   68.760756]  <TASK>
[   68.760756]  dump_stack_lvl+0x57/0x7d
[   68.760756]  print_report.cold+0x5e/0x5db
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  kasan_report+0xbe/0x1c0
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  __lock_acquire+0x3ec1/0x6750
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? register_lock_class+0x18d0/0x18d0
[   68.760756]  lock_acquire+0x1ac/0x4f0
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? mutex_lock_io_nested+0x12c0/0x12c0
[   68.760756]  ? nla_get_range_signed+0x540/0x540
[   68.760756]  ? _raw_spin_lock_irqsave+0x4e/0x50
[   68.760756]  _raw_spin_lock_irqsave+0x39/0x50
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  rfkill_blocked+0xe/0x60
[   68.760756]  nfc_dev_up+0x84/0x260
[   68.760756]  nfc_genl_dev_up+0x90/0xe0
[   68.760756]  genl_family_rcv_msg_doit+0x1f4/0x2f0
[   68.760756]  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230
[   68.760756]  ? security_capable+0x51/0x90
[   68.760756]  genl_rcv_msg+0x280/0x500
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? lock_acquire+0x1ac/0x4f0
[   68.760756]  ? nfc_genl_dev_down+0xe0/0xe0
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  netlink_rcv_skb+0x11b/0x340
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? netlink_ack+0x9c0/0x9c0
[   68.760756]  ? netlink_deliver_tap+0x136/0xb00
[   68.760756]  genl_rcv+0x1f/0x30
[   68.760756]  netlink_unicast+0x430/0x710
[   68.760756]  ? memset+0x20/0x40
[   68.760756]  ? netlink_attachskb+0x740/0x740
[   68.760756]  ? __build_skb_around+0x1f4/0x2a0
[   68.760756]  netlink_sendmsg+0x75d/0xc00
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  sock_sendmsg+0xdf/0x110
[   68.760756]  __sys_sendto+0x19e/0x270
[   68.760756]  ? __ia32_sys_getpeername+0xa0/0xa0
[   68.760756]  ? fd_install+0x178/0x4c0
[   68.760756]  ? fd_install+0x195/0x4c0
[   68.760756]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
[   68.760756]  __x64_sys_sendto+0xd8/0x1b0
[   68.760756]  ? lockdep_hardirqs_on+0xbf/0x130
[   68.760756]  ? syscall_enter_from_user_mode+0x1d/0x50
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756] RIP: 0033:0x7f67fb50e6b3
...
[   68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[   68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3
[   68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003
[   68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c
[   68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e
[   68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003

[   68.760756]  </TASK>
[   68.760756]
[   68.760756] Allocated by task 279:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  __kasan_kmalloc+0x81/0xa0
[   68.760756]  rfkill_alloc+0x7f/0x280
[   68.760756]  nfc_register_device+0xa3/0x1a0
[   68.760756]  nci_register_device+0x77a/0xad0
[   68.760756]  nfcmrvl_nci_register_dev+0x20b/0x2c0
[   68.760756]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   68.760756]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   68.760756]  tty_ioctl+0x764/0x1310
[   68.760756]  __x64_sys_ioctl+0x122/0x190
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756]
[   68.760756] Freed by task 314:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  kasan_set_track+0x21/0x30
[   68.760756]  kasan_set_free_info+0x20/0x30
[   68.760756]  __kasan_slab_free+0x108/0x170
[   68.760756]  kfree+0xb0/0x330
[   68.760756]  device_release+0x96/0x200
[   68.760756]  kobject_put+0xf9/0x1d0
[   68.760756]  nfc_unregister_device+0x77/0x190
[   68.760756]  nfcmrvl_nci_unregister_dev+0x88/0xd0
[   68.760756]  nci_uart_tty_close+0xdf/0x180
[   68.760756]  tty_ldisc_kill+0x73/0x110
[   68.760756]  tty_ldisc_hangup+0x281/0x5b0
[   68.760756]  __tty_hangup.part.0+0x431/0x890
[   68.760756]  tty_release+0x3a8/0xc80
[   68.760756]  __fput+0x1f0/0x8c0
[   68.760756]  task_work_run+0xc9/0x170
[   68.760756]  exit_to_user_mode_prepare+0x194/0x1a0
[   68.760756]  syscall_exit_to_user_mode+0x19/0x50
[   68.760756]  do_syscall_64+0x48/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This patch just add the null out of dev->rfkill to make sure such
dereference cannot happen. This is safe since the device_lock() already
protect the check/write from data race.

Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Jun 25, 2022
[ Upstream commit 1b710b1b10eff9d46666064ea25f079f70bc67a8 ]

Sergey didn't like the locking order,

uart_port->lock  ->  tty_port->lock

uart_write (uart_port->lock)
  __uart_start
    pl011_start_tx
      pl011_tx_chars
        uart_write_wakeup
          tty_port_tty_wakeup
            tty_port_default
              tty_port_tty_get (tty_port->lock)

but those code is so old, and I have no clue how to de-couple it after
checking other locks in the splat. There is an onging effort to make all
printk() as deferred, so until that happens, workaround it for now as a
short-term fix.

LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s
read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$
1000b:$TMPDIR/doio.f2.$$)
WARNING: possible circular locking dependency detected
------------------------------------------------------
doio/49441 is trying to acquire lock:
ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050

but task is already holding lock:
60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #4 (&pool->lock/1){-.-.}:
       lock_acquire+0x320/0x360
       _raw_spin_lock+0x64/0x80
       __queue_work+0x4b4/0xa10
       queue_work_on+0xac/0x11c
       tty_schedule_flip+0x84/0xbc
       tty_flip_buffer_push+0x1c/0x28
       pty_write+0x98/0xd0
       n_tty_write+0x450/0x60c
       tty_write+0x338/0x474
       __vfs_write+0x88/0x214
       vfs_write+0x12c/0x1a4
       redirected_tty_write+0x90/0xdc
       do_loop_readv_writev+0x140/0x180
       do_iter_write+0xe0/0x10c
       vfs_writev+0x134/0x1cc
       do_writev+0xbc/0x130
       __arm64_sys_writev+0x58/0x8c
       el0_svc_handler+0x170/0x240
       el0_sync_handler+0x150/0x250
       el0_sync+0x164/0x180

  -> #3 (&(&port->lock)->rlock){-.-.}:
       lock_acquire+0x320/0x360
       _raw_spin_lock_irqsave+0x7c/0x9c
       tty_port_tty_get+0x24/0x60
       tty_port_default_wakeup+0x1c/0x3c
       tty_port_tty_wakeup+0x34/0x40
       uart_write_wakeup+0x28/0x44
       pl011_tx_chars+0x1b8/0x270
       pl011_start_tx+0x24/0x70
       __uart_start+0x5c/0x68
       uart_write+0x164/0x1c8
       do_output_char+0x33c/0x348
       n_tty_write+0x4bc/0x60c
       tty_write+0x338/0x474
       redirected_tty_write+0xc0/0xdc
       do_loop_readv_writev+0x140/0x180
       do_iter_write+0xe0/0x10c
       vfs_writev+0x134/0x1cc
       do_writev+0xbc/0x130
       __arm64_sys_writev+0x58/0x8c
       el0_svc_handler+0x170/0x240
       el0_sync_handler+0x150/0x250
       el0_sync+0x164/0x180

  -> #2 (&port_lock_key){-.-.}:
       lock_acquire+0x320/0x360
       _raw_spin_lock+0x64/0x80
       pl011_console_write+0xec/0x2cc
       console_unlock+0x794/0x96c
       vprintk_emit+0x260/0x31c
       vprintk_default+0x54/0x7c
       vprintk_func+0x218/0x254
       printk+0x7c/0xa4
       register_console+0x734/0x7b0
       uart_add_one_port+0x734/0x834
       pl011_register_port+0x6c/0xac
       sbsa_uart_probe+0x234/0x2ec
       platform_drv_probe+0xd4/0x124
       really_probe+0x250/0x71c
       driver_probe_device+0xb4/0x200
       __device_attach_driver+0xd8/0x188
       bus_for_each_drv+0xbc/0x110
       __device_attach+0x120/0x220
       device_initial_probe+0x20/0x2c
       bus_probe_device+0x54/0x100
       device_add+0xae8/0xc2c
       platform_device_add+0x278/0x3b8
       platform_device_register_full+0x238/0x2ac
       acpi_create_platform_device+0x2dc/0x3a8
       acpi_bus_attach+0x390/0x3cc
       acpi_bus_attach+0x108/0x3cc
       acpi_bus_attach+0x108/0x3cc
       acpi_bus_attach+0x108/0x3cc
       acpi_bus_scan+0x7c/0xb0
       acpi_scan_init+0xe4/0x304
       acpi_init+0x100/0x114
       do_one_initcall+0x348/0x6a0
       do_initcall_level+0x190/0x1fc
       do_basic_setup+0x34/0x4c
       kernel_init_freeable+0x19c/0x260
       kernel_init+0x18/0x338
       ret_from_fork+0x10/0x18

  -> #1 (console_owner){-...}:
       lock_acquire+0x320/0x360
       console_lock_spinning_enable+0x6c/0x7c
       console_unlock+0x4f8/0x96c
       vprintk_emit+0x260/0x31c
       vprintk_default+0x54/0x7c
       vprintk_func+0x218/0x254
       printk+0x7c/0xa4
       get_random_u64+0x1c4/0x1dc
       shuffle_pick_tail+0x40/0xac
       __free_one_page+0x424/0x710
       free_one_page+0x70/0x120
       __free_pages_ok+0x61c/0xa94
       __free_pages_core+0x1bc/0x294
       memblock_free_pages+0x38/0x48
       __free_pages_memory+0xcc/0xfc
       __free_memory_core+0x70/0x78
       free_low_memory_core_early+0x148/0x18c
       memblock_free_all+0x18/0x54
       mem_init+0xb4/0x17c
       mm_init+0x14/0x38
       start_kernel+0x19c/0x530

  -> #0 (&(&zone->lock)->rlock){..-.}:
       validate_chain+0xf6c/0x2e2c
       __lock_acquire+0x868/0xc2c
       lock_acquire+0x320/0x360
       _raw_spin_lock+0x64/0x80
       rmqueue+0x138/0x2050
       get_page_from_freelist+0x474/0x688
       __alloc_pages_nodemask+0x3b4/0x18dc
       alloc_pages_current+0xd0/0xe0
       alloc_slab_page+0x2b4/0x5e0
       new_slab+0xc8/0x6bc
       ___slab_alloc+0x3b8/0x640
       kmem_cache_alloc+0x4b4/0x588
       __debug_object_init+0x778/0x8b4
       debug_object_init_on_stack+0x40/0x50
       start_flush_work+0x16c/0x3f0
       __flush_work+0xb8/0x124
       flush_work+0x20/0x30
       xlog_cil_force_lsn+0x88/0x204 [xfs]
       xfs_log_force_lsn+0x128/0x1b8 [xfs]
       xfs_file_fsync+0x3c4/0x488 [xfs]
       vfs_fsync_range+0xb0/0xd0
       generic_write_sync+0x80/0xa0 [xfs]
       xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
       xfs_file_write_iter+0x1a0/0x218 [xfs]
       __vfs_write+0x1cc/0x214
       vfs_write+0x12c/0x1a4
       ksys_write+0xb0/0x120
       __arm64_sys_write+0x54/0x88
       el0_svc_handler+0x170/0x240
       el0_sync_handler+0x150/0x250
       el0_sync+0x164/0x180

       other info that might help us debug this:

 Chain exists of:
   &(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&pool->lock/1);
                               lock(&(&port->lock)->rlock);
                               lock(&pool->lock/1);
  lock(&(&zone->lock)->rlock);

                *** DEADLOCK ***

4 locks held by doio/49441:
 #0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4
 #1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at:
xfs_ilock+0x2a8/0x300 [xfs]
 #2: ffff9000129f2390 (rcu_read_lock){....}, at:
rcu_lock_acquire+0x8/0x38
 #3: 60ff000822352818 (&pool->lock/1){-.-.}, at:
start_flush_work+0xd8/0x3f0

               stack backtrace:
CPU: 48 PID: 49441 Comm: doio Tainted: G        W
Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS
L50_5.13_1.11 06/18/2019
Call trace:
 dump_backtrace+0x0/0x248
 show_stack+0x20/0x2c
 dump_stack+0xe8/0x150
 print_circular_bug+0x368/0x380
 check_noncircular+0x28c/0x294
 validate_chain+0xf6c/0x2e2c
 __lock_acquire+0x868/0xc2c
 lock_acquire+0x320/0x360
 _raw_spin_lock+0x64/0x80
 rmqueue+0x138/0x2050
 get_page_from_freelist+0x474/0x688
 __alloc_pages_nodemask+0x3b4/0x18dc
 alloc_pages_current+0xd0/0xe0
 alloc_slab_page+0x2b4/0x5e0
 new_slab+0xc8/0x6bc
 ___slab_alloc+0x3b8/0x640
 kmem_cache_alloc+0x4b4/0x588
 __debug_object_init+0x778/0x8b4
 debug_object_init_on_stack+0x40/0x50
 start_flush_work+0x16c/0x3f0
 __flush_work+0xb8/0x124
 flush_work+0x20/0x30
 xlog_cil_force_lsn+0x88/0x204 [xfs]
 xfs_log_force_lsn+0x128/0x1b8 [xfs]
 xfs_file_fsync+0x3c4/0x488 [xfs]
 vfs_fsync_range+0xb0/0xd0
 generic_write_sync+0x80/0xa0 [xfs]
 xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
 xfs_file_write_iter+0x1a0/0x218 [xfs]
 __vfs_write+0x1cc/0x214
 vfs_write+0x12c/0x1a4
 ksys_write+0xb0/0x120
 __arm64_sys_write+0x54/0x88
 el0_svc_handler+0x170/0x240
 el0_sync_handler+0x150/0x250
 el0_sync+0x164/0x180

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/1573679785-21068-1-git-send-email-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Jul 22, 2022
…tion

commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 upstream.

Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.

Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:

 #1> mkdir -p /sys/fs/cgroup/misc/a/b
 #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
 #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
 #4> PID=$!
 #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
 #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.

After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:

 1. cgroup_migrate_add_src() is called on B for the leader.

 2. cgroup_migrate_add_src() is called on A for the other threads.

 3. cgroup_migrate_prepare_dst() is called. It scans the src list.

 4. It notices that B wants to migrate to A, so it tries to A to the dst
    list but realizes that its ->mg_preload_node is already busy.

 5. and then it notices A wants to migrate to A as it's an identity
    migration, it culls it by list_del_init()'ing its ->mg_preload_node and
    putting references accordingly.

 6. The rest of migration takes place with B on the src list but nothing on
    the dst list.

This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.

This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.

This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Jul 29, 2022
In 2019, Sergey fixed a lockdep splat with 15341b1dd409 ("char/random:
silence a lockdep splat with printk()"), but that got reverted soon
after from 4.19 because back then it apparently caused various problems.
But the issue it was fixing is still there, and more generally, many
patches turning printk() into printk_deferred() have landed since,
making me suspect it's okay to try this out again.

This should fix the following deadlock found by the kernel test robot:

[   18.287691] WARNING: possible circular locking dependency detected
[   18.287692] 4.19.248-00165-g3d1f971aa81f #1 Not tainted
[   18.287693] ------------------------------------------------------
[   18.287712] stop/202 is trying to acquire lock:
[   18.287713] (ptrval) (console_owner){..-.}, at: console_unlock (??:?)
[   18.287717]
[   18.287718] but task is already holding lock:
[   18.287718] (ptrval) (&(&port->lock)->rlock){-...}, at: pty_write (pty.c:?)
[   18.287722]
[   18.287722] which lock already depends on the new lock.
[   18.287723]
[   18.287724]
[   18.287725] the existing dependency chain (in reverse order) is:
[   18.287725]
[   18.287726] -> #2 (&(&port->lock)->rlock){-...}:
[   18.287729] validate_chain+0x84a/0xe00
[   18.287729] __lock_acquire (lockdep.c:?)
[   18.287730] lock_acquire (??:?)
[   18.287731] _raw_spin_lock_irqsave (??:?)
[   18.287732] tty_port_tty_get (??:?)
[   18.287733] tty_port_default_wakeup (tty_port.c:?)
[   18.287734] tty_port_tty_wakeup (??:?)
[   18.287734] uart_write_wakeup (??:?)
[   18.287735] serial8250_tx_chars (??:?)
[   18.287736] serial8250_handle_irq (??:?)
[   18.287737] serial8250_default_handle_irq (8250_port.c:?)
[   18.287738] serial8250_interrupt (8250_core.c:?)
[   18.287738] __handle_irq_event_percpu (??:?)
[   18.287739] handle_irq_event_percpu (??:?)
[   18.287740] handle_irq_event (??:?)
[   18.287741] handle_edge_irq (??:?)
[   18.287742] handle_irq (??:?)
[   18.287742] do_IRQ (??:?)
[   18.287743] common_interrupt (entry_32.o:?)
[   18.287744] _raw_spin_unlock_irqrestore (??:?)
[   18.287745] uart_write (serial_core.c:?)
[   18.287746] process_output_block (n_tty.c:?)
[   18.287747] n_tty_write (n_tty.c:?)
[   18.287747] tty_write (tty_io.c:?)
[   18.287748] __vfs_write (??:?)
[   18.287749] vfs_write (??:?)
[   18.287750] ksys_write (??:?)
[   18.287750] sys_write (??:?)
[   18.287751] do_fast_syscall_32 (??:?)
[   18.287752] entry_SYSENTER_32 (??:?)
[   18.287752]
[   18.287753] -> #1 (&port_lock_key){-.-.}:
[   18.287756]
[   18.287756] -> #0 (console_owner){..-.}:
[   18.287759] check_prevs_add (lockdep.c:?)
[   18.287760] validate_chain+0x84a/0xe00
[   18.287761] __lock_acquire (lockdep.c:?)
[   18.287761] lock_acquire (??:?)
[   18.287762] console_unlock (??:?)
[   18.287763] vprintk_emit (??:?)
[   18.287764] vprintk_default (??:?)
[   18.287764] vprintk_func (??:?)
[   18.287765] printk (??:?)
[   18.287766] get_random_u32 (??:?)
[   18.287767] shuffle_freelist (slub.c:?)
[   18.287767] allocate_slab (slub.c:?)
[   18.287768] new_slab (slub.c:?)
[   18.287769] ___slab_alloc+0x6d0/0xb20
[   18.287770] __slab_alloc+0xd6/0x2e0
[   18.287770] __kmalloc (??:?)
[   18.287771] tty_buffer_alloc (tty_buffer.c:?)
[   18.287772] __tty_buffer_request_room (tty_buffer.c:?)
[   18.287773] tty_insert_flip_string_fixed_flag (??:?)
[   18.287774] pty_write (pty.c:?)
[   18.287775] process_output_block (n_tty.c:?)
[   18.287776] n_tty_write (n_tty.c:?)
[   18.287777] tty_write (tty_io.c:?)
[   18.287778] __vfs_write (??:?)
[   18.287779] vfs_write (??:?)
[   18.287780] ksys_write (??:?)
[   18.287780] sys_write (??:?)
[   18.287781] do_fast_syscall_32 (??:?)
[   18.287782] entry_SYSENTER_32 (??:?)
[   18.287783]
[   18.287783] other info that might help us debug this:
[   18.287784]
[   18.287785] Chain exists of:
[   18.287785]   console_owner --> &port_lock_key --> &(&port->lock)->rlock
[   18.287789]
[   18.287790]  Possible unsafe locking scenario:
[   18.287790]
[   18.287791]        CPU0                    CPU1
[   18.287792]        ----                    ----
[   18.287792]   lock(&(&port->lock)->rlock);
[   18.287794]                                lock(&port_lock_key);
[   18.287814]                                lock(&(&port->lock)->rlock);
[   18.287815]   lock(console_owner);
[   18.287817]
[   18.287818]  *** DEADLOCK ***
[   18.287818]
[   18.287819] 6 locks held by stop/202:
[   18.287820] #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read (??:?)
[   18.287823] #1: (ptrval) (&tty->atomic_write_lock){+.+.}, at: tty_write_lock (tty_io.c:?)
[   18.287826] #2: (ptrval) (&o_tty->termios_rwsem/1){++++}, at: n_tty_write (n_tty.c:?)
[   18.287830] #3: (ptrval) (&ldata->output_lock){+.+.}, at: process_output_block (n_tty.c:?)
[   18.287834] #4: (ptrval) (&(&port->lock)->rlock){-...}, at: pty_write (pty.c:?)
[   18.287838] #5: (ptrval) (console_lock){+.+.}, at: console_trylock_spinning (printk.c:?)
[   18.287841]
[   18.287842] stack backtrace:
[   18.287843] CPU: 0 PID: 202 Comm: stop Not tainted 4.19.248-00165-g3d1f971aa81f #1
[   18.287843] Call Trace:
[   18.287844] dump_stack (??:?)
[   18.287845] print_circular_bug.cold+0x78/0x8b
[   18.287846] check_prev_add+0x66a/0xd20
[   18.287847] check_prevs_add (lockdep.c:?)
[   18.287848] validate_chain+0x84a/0xe00
[   18.287848] __lock_acquire (lockdep.c:?)
[   18.287849] lock_acquire (??:?)
[   18.287850] ? console_unlock (??:?)
[   18.287851] console_unlock (??:?)
[   18.287851] ? console_unlock (??:?)
[   18.287852] ? native_save_fl (??:?)
[   18.287853] vprintk_emit (??:?)
[   18.287854] vprintk_default (??:?)
[   18.287855] vprintk_func (??:?)
[   18.287855] printk (??:?)
[   18.287856] get_random_u32 (??:?)
[   18.287857] ? shuffle_freelist (slub.c:?)
[   18.287858] shuffle_freelist (slub.c:?)
[   18.287858] ? page_address (??:?)
[   18.287859] allocate_slab (slub.c:?)
[   18.287860] new_slab (slub.c:?)
[   18.287861] ? pvclock_clocksource_read (??:?)
[   18.287862] ___slab_alloc+0x6d0/0xb20
[   18.287862] ? kvm_sched_clock_read (kvmclock.c:?)
[   18.287863] ? __slab_alloc+0xbc/0x2e0
[   18.287864] ? native_wbinvd (paravirt.c:?)
[   18.287865] __slab_alloc+0xd6/0x2e0
[   18.287865] __kmalloc (??:?)
[   18.287866] ? __lock_acquire (lockdep.c:?)
[   18.287867] ? tty_buffer_alloc (tty_buffer.c:?)
[   18.287868] tty_buffer_alloc (tty_buffer.c:?)
[   18.287869] __tty_buffer_request_room (tty_buffer.c:?)
[   18.287869] tty_insert_flip_string_fixed_flag (??:?)
[   18.287870] pty_write (pty.c:?)
[   18.287871] process_output_block (n_tty.c:?)
[   18.287872] n_tty_write (n_tty.c:?)
[   18.287873] ? print_dl_stats (??:?)
[   18.287874] ? n_tty_ioctl (n_tty.c:?)
[   18.287874] tty_write (tty_io.c:?)
[   18.287875] ? n_tty_ioctl (n_tty.c:?)
[   18.287876] ? tty_write_unlock (tty_io.c:?)
[   18.287877] __vfs_write (??:?)
[   18.287877] vfs_write (??:?)
[   18.287878] ? __fget_light (file.c:?)
[   18.287879] ksys_write (??:?)

Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Lech Perczak <l.perczak@camlintechnologies.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/Ytz+lo4zRQYG3JUR@xsang-OptiPlex-9020
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Aug 25, 2022
commit dc4d31684974d140250f3ee612c3f0cab13b3146 upstream.

[BUG]
If we have a btrfs image with dirty log, along with an unsupported RO
compatible flag:

log_root		30474240
...
compat_flags		0x0
compat_ro_flags		0x40000003
			( FREE_SPACE_TREE |
			  FREE_SPACE_TREE_VALID |
			  unknown flag: 0x40000000 )

Then even if we can only mount it RO, we will still cause metadata
update for log replay:

  BTRFS info (device dm-1): flagging fs with big metadata feature
  BTRFS info (device dm-1): using free space tree
  BTRFS info (device dm-1): has skinny extents
  BTRFS info (device dm-1): start tree-log replay

This is definitely against RO compact flag requirement.

[CAUSE]
RO compact flag only forces us to do RO mount, but we will still do log
replay for plain RO mount.

Thus this will result us to do log replay and update metadata.

This can be very problematic for new RO compat flag, for example older
kernel can not understand v2 cache, and if we allow metadata update on
RO mount and invalidate/corrupt v2 cache.

[FIX]
Just reject the mount unless rescue=nologreplay is provided:

  BTRFS error (device dm-1): cannot replay dirty log with unsupport optional features (0x40000000), try rescue=nologreplay instead

We don't want to set rescue=nologreply directly, as this would make the
end user to read the old data, and cause confusion.

Since the such case is really rare, we're mostly fine to just reject the
mount with an error message, which also includes the proper workaround.

CC: stable@vger.kernel.org #4.9+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jenkins-verticloud pushed a commit that referenced this pull request Sep 15, 2022
[ Upstream commit 84a53580c5d2138c7361c7c3eea5b31827e63b35 ]

The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
Segment Routing Headers. This configuration is realised via netlink through
four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
length of the SECRET attribute, it is possible to provide invalid combinations
(e.g., secret = "", secretlen = 64). This case is not checked in the code and
with an appropriately crafted netlink message, an out-of-bounds read of up
to 64 bytes (max secret length) can occur past the skb end pointer and into
skb_shared_info:

Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
208		memcpy(hinfo->secret, secret, slen);
(gdb) bt
 #0  seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
 #1  0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
    extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>,
    family=<optimized out>) at net/netlink/genetlink.c:731
 #2  0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
    family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775
 #3  genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
 #4  0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>)
    at net/netlink/af_netlink.c:2501
 #5  0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
 #6  0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
    at net/netlink/af_netlink.c:1319
 #7  netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>)
    at net/netlink/af_netlink.c:1345
 #8  0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921
...
(gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
$1 = 0xffff88800b1b76c0
(gdb) p/x secret
$2 = 0xffff88800b1b76c0
(gdb) p slen
$3 = 64 '@'

The OOB data can then be read back from userspace by dumping HMAC state. This
commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
SECRET.

Reported-by: Lucas Leong <wmliang.tw@gmail.com>
Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
Fixes: 4f4853d ("ipv6: sr: implement API to control SR HMAC structure")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Jan 18, 2023
…g the sock

[ Upstream commit 3cf7203ca620682165706f70a1b12b5194607dce ]

There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.

   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
   #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
   #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
   #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
   #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
   #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
   #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
      [exception RIP: vxlan_ecn_decapsulate+0x3b]
      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
   #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
   #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
  #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
  #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
  #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
  #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
  #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
  #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
  #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
  #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
  #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3

Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh

Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Jan 18, 2023
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ]

Commit 9130b8d ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
 #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
 #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
 #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
 #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
 #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
 #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
 #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
 #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
 #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.

Actual scheduing of that process might be after rpc.gssd processes the
next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.

This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.

Signed-off-by: minoura makoto <minoura@valinux.co.jp>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9130b8d ("SUNRPC: allow for upcalls for same uid but different gss service")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jenkins-verticloud pushed a commit that referenced this pull request Feb 6, 2023
[ Upstream commit 6c4ca03bd890566d873e3593b32d034bf2f5a087 ]

During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:

crash> foreach UN bt
...
PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
...
 #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
 #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...

PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
...
 #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c00000003721fc60] process_one_work at c00000000019e5c4
...

PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
...
 #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c000000037247c60] process_one_work at c00000000019e5c4
...

PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
...:1

 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...

Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks.  If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.

Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:

crash> foreach UN bt
PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
...
 #4 [c000000006867a60] rtnl_lock at c000000000c940d8
 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
 #6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
...
 #3 [c000000036c1bb70] msleep at c000000000259e6c
 #4 [c000000036c1bba0] napi_disable at c000000000c6b848
 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
 #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...

This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.

Fixes: db84bf4 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants