Removed trailing spaces at the end of paragraphs #256

chaityabshah · 2016-02-08T22:07:31Z

Making a readme parser, found some trailing spaces that should be removed.

smclt30p · 2016-02-09T20:54:48Z

Linus does not pull from GitHub.

xfstests/011 failed in node with small_size filesystem. Can be reproduced by following script: DEV_LIST="/dev/vdd /dev/vde" DEV_REPLACE="/dev/vdf" do_test() { local mkfs_opt="$1" local size="$2" dmesg -c >/dev/null umount $SCRATCH_MNT &>/dev/null echo mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}" mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1 mount "${DEV_LIST[0]}" $SCRATCH_MNT echo -n "Writing big files" dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1 for ((i = 1; i <= size; i++)); do echo -n . /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1 done echo echo Start replace btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || { dmesg return 1 } return 0 } # Set size to value near fs size # for example, 1897 can trigger this bug in 2.6G device. # ./do_test "-d raid1 -m raid1" 1897 System will report replace fail with following warning in dmesg: [ 134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started [ 135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28 [ 135.543505] ------------[ cut here ]------------ [ 135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440() [ 135.545276] Modules linked in: [ 135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 coreos#256 [ 135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 135.547798] ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000 [ 135.548774] ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4 [ 135.549774] ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70 [ 135.550758] Call Trace: [ 135.551086] [<ffffffff817fe7b5>] dump_stack+0x44/0x55 [ 135.551737] [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0 [ 135.552487] [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20 [ 135.553211] [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440 [ 135.554051] [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0 [ 135.554722] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.555506] [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0 [ 135.556304] [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580 [ 135.557009] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0 [ 135.557855] [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70 [ 135.558669] [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90 [ 135.559374] [<ffffffff81202124>] SyS_ioctl+0x74/0x80 [ 135.559987] [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f [ 135.560842] ---[ end trace 2a5c1fc3205abbdd ]--- Reason: When big data writen to fs, the whole free space will be allocated for data chunk. And operation as scrub need to set_block_ro(), and when there is only one metadata chunk in system(or other metadata chunks are all full), the function will try to allocate a new chunk, and failed because no space in device. Fix: When set_block_ro failed for metadata chunk, it is not a problem because scrub_lock paused commit_trancaction in same time, and metadata are always cowed, so the on-the-fly writepages will not write data into same place with scrub/replace. Let replace continue in this case is no problem. Tested by above script, and xfstests/011, plus 100 times xfstests/070. Changelog v1->v2: 1: Add detail comments in source and commit-message. 2: Add dmesg detail into commit-message. 3: Limit return value of -ENOSPC to be passed. All suggested by: Filipe Manana <fdmanana@gmail.com> Suggested-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>

ancaemanuel · 2016-02-27T17:59:12Z

Tutorial: https://www.youtube.com/watch?v=LLBrBBImJt4
Setting git send mail: https://coderwall.com/p/dp-gka/setting-up-git-send-email-with-gmail
Use get_maintainer.pl
Use the mailing lists.
Close this please, and tell to others.

(cherry picked from commit 39af6b1) The perf cpu offline callback takes down all cpu context events and releases swhash->swevent_hlist. This could race with task context software event being just scheduled on this cpu via perf_swevent_add while cpu hotplug code already cleaned up event's data. The race happens in the gap between the cpu notifier code and the cpu being actually taken down. Note that only cpu ctx events are terminated in the perf cpu hotplug code. It's easily reproduced with: $ perf record -e faults perf bench sched pipe while putting one of the cpus offline: # echo 0 > /sys/devices/system/cpu/cpu1/online Console emits following warning: WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0() Modules linked in: CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G W 3.14.0+ torvalds#256 Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008 0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005 0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046 ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001 Call Trace: [<ffffffff81665a23>] dump_stack+0x4f/0x7c [<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0 [<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0 [<ffffffff8111646a>] group_sched_in+0x6a/0x1f0 [<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0 [<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450 [<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0 [<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0 [<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460 [<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30 [<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100 [<ffffffff8166a7de>] __schedule+0x30e/0xad0 [<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560 [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70 [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70 [<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70 [<ffffffff816707f0>] retint_kernel+0x20/0x30 [<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90 [<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81679321>] ? sysret_check+0x5/0x56 Fixing this by tracking the cpu hotplug state and displaying the WARN only if current cpu is initialized properly. Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ifb8d57a07a372ce20d7a4d31931f3b980f9e553b

lkl tools: fix checkpath.sh error when running directly on top of the…

Since d2852a2 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e7 ("bpf: fix unlocking of jited image when module ronx not set") that uses the former, Fengguang reported random corruptions on his i386 test machine [1]. On i386 there is no JIT available, and since his kernel config doesn't have kernel modules enabled, there was also no DEBUG_SET_MODULE_RONX enabled before which would set interpreted bpf_prog image as read-only like we do in various other cases for quite some time now, e.g. x86_64, arm64, etc. Thus, the difference with above commits was that we now used set_memory_ro() and set_memory_rw() on i386, which resulted in these issues. When reproducing this with Fengguang's config and qemu image, I changed lib/test_bpf.c to be run during boot instead of relying on trinity to fiddle with cBPF. The issues I saw with the BPF test suite when set_memory_ro() and set_memory_rw() is used to write protect image on i386 is that after a number of tests I noticed a corruption happening in bpf_prog_realloc(). Specifically, fp_old's content gets corrupted right *after* the (unrelated) __vmalloc() call and contains only zeroes right after the call instead of the original prog data. fp_old should have been freed later on via __bpf_prog_free() *after* we copied all the data over to the newly allocated fp. Result looks like: [...] [ 13.107240] test_bpf: torvalds#249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:0 17 PASS [ 13.108182] test_bpf: torvalds#250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:0 17 PASS [ 13.109206] test_bpf: torvalds#251 JMP_JA: Jump, gap, jump, ... jited:0 16 PASS [ 13.110493] test_bpf: torvalds#252 BPF_MAXINSNS: Maximum possible literals jited:0 12 PASS [ 13.111885] test_bpf: torvalds#253 BPF_MAXINSNS: Single literal jited:0 8 PASS [ 13.112804] test_bpf: torvalds#254 BPF_MAXINSNS: Run/add until end jited:0 6341 PASS [ 13.177195] test_bpf: torvalds#255 BPF_MAXINSNS: Too many instructions PASS [ 13.177689] test_bpf: torvalds#256 BPF_MAXINSNS: Very long jump jited:0 9 PASS [ 13.178611] test_bpf: torvalds#257 BPF_MAXINSNS: Ctx heavy transformations [ 13.178713] BUG: unable to handle kernel NULL pointer dereference at 00000034 [ 13.179740] IP: bpf_prog_realloc+0x5b/0x90 [ 13.180017] *pde = 00000000 [ 13.180017] [ 13.180017] Oops: 0002 [#1] DEBUG_PAGEALLOC [ 13.180017] CPU: 0 PID: 1 Comm: swapper Not tainted 4.10.0-57268-gd627975-dirty torvalds#50 [ 13.180017] task: 401ec000 task.stack: 401f2000 [ 13.180017] EIP: bpf_prog_realloc+0x5b/0x90 [ 13.180017] EFLAGS: 00210246 CPU: 0 [ 13.180017] EAX: 00000000 EBX: 57ae1000 ECX: 00000000 EDX: 57ae1000 [ 13.180017] ESI: 00000019 EDI: 57b07000 EBP: 401f3e74 ESP: 401f3e68 [ 13.180017] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 13.180017] CR0: 80050033 CR2: 00000034 CR3: 12cb1000 CR4: 00000610 [ 13.180017] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 13.180017] DR6: fffe0ff0 DR7: 00000400 [ 13.180017] Call Trace: [ 13.180017] bpf_prepare_filter+0x317/0x3a0 [ 13.180017] bpf_prog_create+0x65/0xa0 [ 13.180017] test_bpf_init+0x1ca/0x628 [ 13.180017] ? test_hexdump_init+0xb5/0xb5 [ 13.180017] do_one_initcall+0x7c/0x11c [...] When using trinity from Fengguang's reproducer, the corruptions were at inconsistent places, presumably from code dealing with allocations and seeing similar effects as mentioned above. Not using set_memory_ro() and set_memory_rw() lets the test suite run just fine as expected, thus it looks like using set_memory_*() on i386 seems broken and mentioned commits just uncovered it. Also, for checking, I enabled DEBUG_RODATA_TEST for that kernel. Latter shows that memory protecting the kernel seems not working either on i386 (!). Test suite output: [...] [ 12.692836] Write protecting the kernel text: 13416k [ 12.693309] Write protecting the kernel read-only data: 5292k [ 12.693802] rodata_test: test data was not read only [...] Work-around to not enable ARCH_HAS_SET_MEMORY for i386 is not optimal as it doesn't fix the issue in presumably broken set_memory_*(), but it at least avoids people avoid having to deal with random corruptions that are hard to track down for the time being until a real fix can be found. [1] https://lkml.org/lkml/2017/3/2/648 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexei Starovoitov <ast@kernel.org>

User can supply negative blue flame index and cause to "bfregn > bfregi->num_dyn_bfregs" protection check return "false". It will cause to below error while trying to access sys_pages[]. BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400 Read of size 4 at addr ffff880065561904 by task syz-executor446/314 CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ torvalds#256 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xef/0x17e print_address_description+0x83/0x3b0 kasan_report+0x18d/0x4d0 bfregn_to_uar_index+0x34f/0x400 create_user_qp+0x272/0x227d create_qp_common+0x32eb/0x43e0 mlx5_ib_create_qp+0x379/0x1ca0 create_qp.isra.5+0xc94/0x22d0 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0xc2c/0x1010 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x433679 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679 RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006 Allocated by task 314: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x1a9/0x510 mlx5_ib_alloc_ucontext+0x966/0x2620 ib_uverbs_get_context+0x23f/0xa60 ib_uverbs_write+0xc2c/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1: __kasan_slab_free+0x12e/0x180 kfree+0x159/0x630 kvfree+0x37/0x50 single_release+0x8e/0xf0 __fput+0x2d8/0x900 task_work_run+0x102/0x1f0 exit_to_usermode_loop+0x159/0x1c0 do_syscall_64+0x408/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff880065561100 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 2052 bytes inside of 4096-byte region [ffff880065561100, ffff880065562100) The buggy address belongs to the page: page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Cc: <stable@vger.kernel.org> # 4.15 Fixes: 1ee47ab ("IB/mlx5: Enable QP creation with a given blue flame index") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

User's supplied index is checked again total number of system pages, but this number already includes num_static_sys_pages, so addition of that value to supplied index causes to below error while trying to access sys_pages[]. BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400 Read of size 4 at addr ffff880065561904 by task syz-executor446/314 CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ torvalds#256 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xef/0x17e print_address_description+0x83/0x3b0 kasan_report+0x18d/0x4d0 bfregn_to_uar_index+0x34f/0x400 create_user_qp+0x272/0x227d create_qp_common+0x32eb/0x43e0 mlx5_ib_create_qp+0x379/0x1ca0 create_qp.isra.5+0xc94/0x22d0 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0xc2c/0x1010 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x433679 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679 RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006 Allocated by task 314: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x1a9/0x510 mlx5_ib_alloc_ucontext+0x966/0x2620 ib_uverbs_get_context+0x23f/0xa60 ib_uverbs_write+0xc2c/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1: __kasan_slab_free+0x12e/0x180 kfree+0x159/0x630 kvfree+0x37/0x50 single_release+0x8e/0xf0 __fput+0x2d8/0x900 task_work_run+0x102/0x1f0 exit_to_usermode_loop+0x159/0x1c0 do_syscall_64+0x408/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff880065561100 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 2052 bytes inside of 4096-byte region [ffff880065561100, ffff880065562100) The buggy address belongs to the page: page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Cc: <stable@vger.kernel.org> # 4.15 Fixes: 1ee47ab ("IB/mlx5: Enable QP creation with a given blue flame index") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

User's supplied index is checked again total number of system pages, but this number already includes num_static_sys_pages, so addition of that value to supplied index causes to below error while trying to access sys_pages[]. BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400 Read of size 4 at addr ffff880065561904 by task syz-executor446/314 CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ torvalds#256 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xef/0x17e print_address_description+0x83/0x3b0 kasan_report+0x18d/0x4d0 bfregn_to_uar_index+0x34f/0x400 create_user_qp+0x272/0x227d create_qp_common+0x32eb/0x43e0 mlx5_ib_create_qp+0x379/0x1ca0 create_qp.isra.5+0xc94/0x22d0 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0xc2c/0x1010 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x433679 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679 RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006 Allocated by task 314: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x1a9/0x510 mlx5_ib_alloc_ucontext+0x966/0x2620 ib_uverbs_get_context+0x23f/0xa60 ib_uverbs_write+0xc2c/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1: __kasan_slab_free+0x12e/0x180 kfree+0x159/0x630 kvfree+0x37/0x50 single_release+0x8e/0xf0 __fput+0x2d8/0x900 task_work_run+0x102/0x1f0 exit_to_usermode_loop+0x159/0x1c0 do_syscall_64+0x408/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff880065561100 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 2052 bytes inside of 4096-byte region [ffff880065561100, ffff880065562100) The buggy address belongs to the page: page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Cc: <stable@vger.kernel.org> # 4.15 Fixes: 1ee47ab ("IB/mlx5: Enable QP creation with a given blue flame index") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

WARNING: please, no spaces at the start of a line torvalds#250: FILE: kernel/cgroup/cgroup.c:4554: + {$ ERROR: code indent should use tabs where possible torvalds#251: FILE: kernel/cgroup/cgroup.c:4555: + .name = "cpu.pressure",$ WARNING: please, no spaces at the start of a line torvalds#251: FILE: kernel/cgroup/cgroup.c:4555: + .name = "cpu.pressure",$ ERROR: code indent should use tabs where possible torvalds#252: FILE: kernel/cgroup/cgroup.c:4556: + .flags = CFTYPE_NOT_ON_ROOT,$ WARNING: please, no spaces at the start of a line torvalds#252: FILE: kernel/cgroup/cgroup.c:4556: + .flags = CFTYPE_NOT_ON_ROOT,$ ERROR: code indent should use tabs where possible torvalds#253: FILE: kernel/cgroup/cgroup.c:4557: + .seq_show = cgroup_cpu_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#253: FILE: kernel/cgroup/cgroup.c:4557: + .seq_show = cgroup_cpu_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#254: FILE: kernel/cgroup/cgroup.c:4558: + },$ WARNING: please, no spaces at the start of a line torvalds#255: FILE: kernel/cgroup/cgroup.c:4559: + {$ ERROR: code indent should use tabs where possible torvalds#256: FILE: kernel/cgroup/cgroup.c:4560: + .name = "memory.pressure",$ WARNING: please, no spaces at the start of a line torvalds#256: FILE: kernel/cgroup/cgroup.c:4560: + .name = "memory.pressure",$ ERROR: code indent should use tabs where possible torvalds#257: FILE: kernel/cgroup/cgroup.c:4561: + .flags = CFTYPE_NOT_ON_ROOT,$ WARNING: please, no spaces at the start of a line torvalds#257: FILE: kernel/cgroup/cgroup.c:4561: + .flags = CFTYPE_NOT_ON_ROOT,$ ERROR: code indent should use tabs where possible torvalds#258: FILE: kernel/cgroup/cgroup.c:4562: + .seq_show = cgroup_memory_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#258: FILE: kernel/cgroup/cgroup.c:4562: + .seq_show = cgroup_memory_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#259: FILE: kernel/cgroup/cgroup.c:4563: + },$ WARNING: please, no spaces at the start of a line torvalds#260: FILE: kernel/cgroup/cgroup.c:4564: + {$ ERROR: code indent should use tabs where possible torvalds#261: FILE: kernel/cgroup/cgroup.c:4565: + .name = "io.pressure",$ WARNING: please, no spaces at the start of a line torvalds#261: FILE: kernel/cgroup/cgroup.c:4565: + .name = "io.pressure",$ ERROR: code indent should use tabs where possible torvalds#262: FILE: kernel/cgroup/cgroup.c:4566: + .flags = CFTYPE_NOT_ON_ROOT,$ WARNING: please, no spaces at the start of a line torvalds#262: FILE: kernel/cgroup/cgroup.c:4566: + .flags = CFTYPE_NOT_ON_ROOT,$ ERROR: code indent should use tabs where possible torvalds#263: FILE: kernel/cgroup/cgroup.c:4567: + .seq_show = cgroup_io_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#263: FILE: kernel/cgroup/cgroup.c:4567: + .seq_show = cgroup_io_pressure_show,$ WARNING: please, no spaces at the start of a line torvalds#264: FILE: kernel/cgroup/cgroup.c:4568: + },$ WARNING: please, no spaces at the start of a line torvalds#322: FILE: kernel/sched/psi.c:424: + cgroup = task->cgroups->dfl_cgrp;$ WARNING: please, no spaces at the start of a line torvalds#323: FILE: kernel/sched/psi.c:425: + while (cgroup && (parent = cgroup_parent(cgroup))) {$ WARNING: suspect code indent for conditional statements (7, 15) torvalds#323: FILE: kernel/sched/psi.c:425: + while (cgroup && (parent = cgroup_parent(cgroup))) { + struct psi_group *group; ERROR: code indent should use tabs where possible torvalds#324: FILE: kernel/sched/psi.c:426: + struct psi_group *group;$ WARNING: please, no spaces at the start of a line torvalds#324: FILE: kernel/sched/psi.c:426: + struct psi_group *group;$ ERROR: code indent should use tabs where possible torvalds#326: FILE: kernel/sched/psi.c:428: + group = cgroup_psi(cgroup);$ WARNING: please, no spaces at the start of a line torvalds#326: FILE: kernel/sched/psi.c:428: + group = cgroup_psi(cgroup);$ ERROR: code indent should use tabs where possible torvalds#327: FILE: kernel/sched/psi.c:429: + psi_group_change(group, cpu, now, clear, set);$ WARNING: please, no spaces at the start of a line torvalds#327: FILE: kernel/sched/psi.c:429: + psi_group_change(group, cpu, now, clear, set);$ ERROR: code indent should use tabs where possible torvalds#329: FILE: kernel/sched/psi.c:431: + cgroup = parent;$ WARNING: please, no spaces at the start of a line torvalds#329: FILE: kernel/sched/psi.c:431: + cgroup = parent;$ WARNING: please, no spaces at the start of a line torvalds#330: FILE: kernel/sched/psi.c:432: + }$ WARNING: braces {} are not necessary for any arm of this statement torvalds#378: FILE: kernel/sched/psi.c:537: + if (task_on_rq_queued(task)) { [...] + } else if (task->in_iowait) { [...] total: 13 errors, 24 warnings, 334 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. NOTE: Whitespace errors detected. You may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/psi-cgroup-support.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Johannes Weiner <jweiner@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

rankPos structure variables value can not be more than 512. So it can easily be declared as U16 rather than U32. It will reduce stack usage of HUF_sort from 256 bytes to 128 bytes original: e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc} e24cb004 sub fp, ip, #4 e24ddc01 sub sp, sp, torvalds#256 ; 0x100 changed: e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc} e24cb004 sub fp, ip, #4 e24dd080 sub sp, sp, torvalds#128 ; 0x80 Link: http://lkml.kernel.org/r/1559552526-4317-3-git-send-email-maninder1.s@samsung.com Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Cc: Amit Sahrawat <a.sahrawat@samsung.com> Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: <pankaj.m@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rankPos structure variables value can not be more than 512. So it can easily be declared as U16 rather than U32. It will reduce stack usage of HUF_sort from 256 bytes to 128 bytes original: e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc} e24cb004 sub fp, ip, #4 e24ddc01 sub sp, sp, torvalds#256 ; 0x100 changed: e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc} e24cb004 sub fp, ip, #4 e24dd080 sub sp, sp, torvalds#128 ; 0x80 Link: http://lkml.kernel.org/r/1559552526-4317-3-git-send-email-maninder1.s@samsung.com Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Cc: Amit Sahrawat <a.sahrawat@samsung.com> Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: <pankaj.m@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

__vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_link() is added to link fdb to vxlan dev. vxlan_fdb_link() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ torvalds#256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b83 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

__vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ torvalds#256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b83 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

__vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ torvalds#256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b83 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit 7c31e54 ] __vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ torvalds#256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b83 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 7c31e54 ] __vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0241b83 ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel torvalds#252 tc_opts_after:OK torvalds#253 tc_opts_append:OK torvalds#254 tc_opts_basic:OK torvalds#255 tc_opts_before:OK torvalds#256 tc_opts_chain_classic:OK torvalds#257 tc_opts_chain_mixed:OK torvalds#258 tc_opts_delete_empty:OK torvalds#259 tc_opts_demixed:OK torvalds#260 tc_opts_detach:OK torvalds#261 tc_opts_detach_after:OK torvalds#262 tc_opts_detach_before:OK torvalds#263 tc_opts_dev_cleanup:OK torvalds#264 tc_opts_invalid:OK torvalds#265 tc_opts_max:OK torvalds#266 tc_opts_mixed:OK torvalds#267 tc_opts_prepend:OK torvalds#268 tc_opts_query:OK torvalds#269 tc_opts_query_attach:OK <--- (new test) torvalds#270 tc_opts_replace:OK torvalds#271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel torvalds#252 tc_opts_after:OK torvalds#253 tc_opts_append:OK torvalds#254 tc_opts_basic:OK torvalds#255 tc_opts_before:OK torvalds#256 tc_opts_chain_classic:OK torvalds#257 tc_opts_chain_mixed:OK torvalds#258 tc_opts_delete_empty:OK torvalds#259 tc_opts_demixed:OK torvalds#260 tc_opts_detach:OK torvalds#261 tc_opts_detach_after:OK torvalds#262 tc_opts_detach_before:OK torvalds#263 tc_opts_dev_cleanup:OK torvalds#264 tc_opts_invalid:OK torvalds#265 tc_opts_max:OK torvalds#266 tc_opts_mixed:OK torvalds#267 tc_opts_prepend:OK torvalds#268 tc_opts_query:OK <--- (new test) torvalds#269 tc_opts_replace:OK torvalds#270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>