Skip to content
This repository was archived by the owner on Nov 7, 2025. It is now read-only.

Conversation

@rata
Copy link

@rata rata commented Jan 13, 2021

Here it is a c&p of the commit msg. Github doesn't show the "1" and friends correctly, but that is common use on the linux commit msg IIRC.

Alban Crequy reported a race condition userspace faces when we want to
add an fd and make the same syscall return that value1.

Currently two different ioctl() calls are needed by the process handling
the syscalls (agent) for another userspace process (target). Therefore,
it is possible for the agent to do the first ioctl to add a file
descriptor but the target is interrupted (EINTR) before the agent does
the second ioctl() call.

This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program. This was suggested
by Kees Cook2. This is done by simply changing
seccomp_do_user_notification() to add the fd and return that. In this
way, the target is waked up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.

The struct seccomp_notif_resp, used when doing SECCOMP_IOCTL_NOTIF_SEND
ioctl() to send a response to the target, has three more fields that we
don't allow to set when doing the addfd ioctl(). The reasons for each
are:

  • val: This will be set to the new allocated fd
  • error: If this is non-zero, the value is ignored. Therefore,
    it is pointless in this case as we want to return the value.
  • flags: The only flag is to let userspace continue to execute the
    syscall. This seems pointless, as we want the syscall to return the
    allocated fd.

Signed-off-by: Rodrigo Campos rodrigo@sdfg.com.ar

How to use

See the selftest for an example

Testing done

Selftests and a similar local test to read from the fd and check it is the expected data.

Missing stuff:

  • Test it with Alban golang examples that hit the race condition
  • Rebase to send for 5.11/5.12 (this is currently based on linux 5.10-rc5, this is of course not good)
  • Remove printk. Left them in case you test it, they were useful for me
  • Add selftests
  • Should I use my personal or kinvolk email?

ardbiesheuvel and others added 30 commits December 21, 2020 11:19
…physical section

The early ATAGS/DT mapping code uses SECTION_SHIFT to mask low order
bits of R2, and decides that no ATAGS/DTB were provided if the resulting
value is 0x0.

This means that on systems where DRAM starts at 0x0 (such as Raspberry
Pi), no explicit mapping of the DT will be created if R2 points into the
first 1 MB section of memory. This was not a problem before, because the
decompressed kernel is loaded at the base of DRAM and mapped using
sections as well, and so as long as the DT is referenced via a virtual
address that uses the same translation (the linear map, in this case),
things work fine.

However, commit 7a1be31 ("9012/1: move device tree mapping out of
linear region") changes this, and now the DT is referenced via a virtual
address that is disjoint from the linear mapping of DRAM, and so we need
the early code to create the DT mapping unconditionally.

So let's create the early DT mapping for any value of R2 != 0x0.

Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
… syscall too

We need r1 to be properly set before activating MMU, otherwise any new
exception taken while saving registers into the stack in syscall
prologs will use the user stack, which is wrong and will even lockup
or crash when KUAP is selected.

Do that by switching the meaning of r11 and r1 until we have saved r1
to the stack: copy r1 into r11 and setup the new stack pointer in r1.
To avoid complicating and impacting all generic and specific prolog
code (and more), copy back r1 into r11 once r11 is save onto
the stack.

We could get rid of copying r1 back and forth at the cost of rewriting
everything to use r1 instead of r11 all the way when CONFIG_VMAP_STACK
is set, but the effort is probably not worth it for now.

Fixes: da7bb43 ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a3d819d5c348cee9783a311d5d3f3ba9b48fd219.1608531452.git.christophe.leroy@csgroup.eu
"it is a used" does not make sense.  Should be "it is used".

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20201216134654.271508-1-lee.jones@linaro.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Renumber the steps in submit-checklist.rst as some numbers were skipped.

Signed-off-by: Milan Lakhani <milan.lakhani@codethink.co.uk>
Link: https://lore.kernel.org/r/1608064956-5512-1-git-send-email-milan.lakhani@codethink.co.uk
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Document what a chain of Signed-off-by's in a patch commit message
should mean, explicitly.

This has been carved out from a tip subsystem handbook patchset by
Thomas Gleixner:

  https://lkml.kernel.org/r/20181107171010.421878737@linutronix.de

and incorporates follow-on comments.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20201217183756.GE23634@zn.tnic
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Since the default value of sysctl_max_map_count is defined as
DEFAULT_MAX_MAP_COUNT from mm/util.c

    int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;

DEFAULT_MAX_MAP_COUNT is defined as 65530 (65535-5) in include/linux/mm.h

    #define MAPCOUNT_ELF_CORE_MARGIN        (5)
    #define DEFAULT_MAX_MAP_COUNT   (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN)

Signed-off-by: Fengfei Xi <xi.fengfei@h3c.com>
Link: https://lore.kernel.org/r/20201210082134.36957-1-xi.fengfei@h3c.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull 9p update from Dominique Martinet:

 - fix long-standing limitation on open-unlink-fop pattern

 - add refcount to p9_fid (fixes the above and will allow for more
   cleanups and simplifications in the future)

* tag '9p-for-5.11-rc1' of git://github.com/martinetd/linux:
  9p: Remove unnecessary IS_ERR() check
  9p: Uninitialized variable in v9fs_writeback_fid()
  9p: Fix writeback fid incorrectly being attached to dentry
  9p: apply review requests for fid refcounting
  9p: add refcount to p9_fid struct
  fs/9p: search open fids first
  fs/9p: track open fids
  fs/9p: fix create-unlink-getattr idiom
…/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:

 - cleanup of 68328 code

 - align BSS section to 32bit

* tag 'm68knommu-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: m68328: remove duplicate code
  m68k: m68328: move platform code to separate files
  m68knommu: align BSS section to 4-byte boundaries
…l/git/clk/linux

Pull clk updates from Stephen Boyd:
 "The core framework got some nice improvements this time around. We
  gained the ability to get struct clk pointers from a struct clk_hw so
  that clk providers can consume the clks they provide, if they need to
  do something like that. This has been a long missing part of the clk
  provider API that will help us move away from exposing a struct clk
  pointer in the struct clk_hw. Tracepoints are added for the
  clk_set_rate() "range" functions, similar to the tracepoints we
  already have for clk_set_rate() and we added a column to debugfs to
  help developers understand the hardware enable state of clks in case
  firmware or bootloader state is different than what is expected.
  Overall the core changes are mostly improving the clk driver writing
  experience.

  At the driver level, we have the usual collection of driver updates
  and new drivers for new SoCs. This time around the Qualcomm folks
  introduced a good handful of clk drivers for various parts of three or
  four SoCs. The SiFive folks added a new clk driver for their FU740
  SoCs, coming in second on the diffstat and then Atmel AT91 and Amlogic
  SoCs had lots of work done after that for various new features. One
  last thing to note in the driver area is that the i.MX driver has
  gained a new binding to support SCU clks after being on the list for
  many months. It uses a two cell binding which is sort of rare in clk
  DT bindings. Beyond that we have the usual set of driver fixes and
  tweaks that come from more testing and finding out that some
  configuration was wrong or that a driver could support being built as
  a module.

  Summary:

  Core:
   - Add some trace points for clk_set_rate() "range" functions
   - Add hardware enable information to clk_summary debugfs
   - Replace clk-provider.h with of_clk.h when possible
   - Add devm variant of clk_notifier_register()
   - Add clk_hw_get_clk() to generate a struct clk from a struct clk_hw

  New Drivers:
   - Bindings for Canaan K210 SoC clks
   - Support for SiFive FU740 PRCI
   - Camera clks on Qualcomm SC7180 SoCs
   - GCC and RPMh clks on Qualcomm SDX55 SoCs
   - RPMh clks on Qualcomm SM8350 SoCs
   - LPASS clks on Qualcomm SM8250 SoCs

  Updates:
   - DVFS support for AT91 clk driver
   - Update git repo branch for Renesas clock drivers
   - Add camera (CSI) and video-in (VIN) clocks on Renesas R-Car V3U
   - Add RPC (QSPI/HyperFLASH) clocks on Renesas RZ/G2M, RZ/G2N, and RZ/G2E
   - Stop using __raw_*() I/O accessors in Renesas clk drivers
   - One more conversion of DT bindings to json-schema
   - Make i.MX clk-gate2 driver more flexible
   - New two cell binding for i.MX SCU clks
   - Drop of_match_ptr() in i.MX8 clk drivers
   - Add arch dependencies for Rockchip clk drivers
   - Fix i2s on Rockchip rk3066
   - Add MIPI DSI clks on Amlogic axg and g12 SoCs
   - Support modular builds of Amlogic clk drivers
   - Fix an Amlogic Video PLL clock dependency
   - Samsung Kconfig dependencies updates for better compile test coverage
   - Refactoring of the Samsung PLL clocks driver
   - Small Tegra driver cleanups
   - Minor fixes to Ingenic and VC5 clk drivers
   - Cleanup patches to remove unused variables and plug memory leaks"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (134 commits)
  dt-binding: clock: Document canaan,k210-clk bindings
  dt-bindings: Add Canaan vendor prefix
  clk: vc5: Use "idt,voltage-microvolt" instead of "idt,voltage-microvolts"
  clk: ingenic: Fix divider calculation with div tables
  clk: sunxi-ng: Make sure divider tables have sentinel
  clk: s2mps11: Fix a resource leak in error handling paths in the probe function
  clk: mvebu: a3700: fix the XTAL MODE pin to MPP1_9
  clk: si5351: Wait for bit clear after PLL reset
  clk: at91: sam9x60: remove atmel,osc-bypass support
  clk: at91: sama7g5: register cpu clock
  clk: at91: clk-master: re-factor master clock
  clk: at91: sama7g5: do not allow cpu pll to go higher than 1GHz
  clk: at91: sama7g5: decrease lower limit for MCK0 rate
  clk: at91: sama7g5: remove mck0 from parent list of other clocks
  clk: at91: clk-sam9x60-pll: allow runtime changes for pll
  clk: at91: sama7g5: add 5th divisor for mck0 layout and characteristics
  clk: at91: clk-master: add 5th divisor for mck master
  clk: at91: sama7g5: allow SYS and CPU PLLs to be exported and referenced in DT
  dt-bindings: clock: at91: add sama7g5 pll defines
  clk: at91: sama7g5: fix compilation error
  ...
If emergency system shutdown is called, like by thermal shutdown,
a dm device could be alive when the block device couldn't process
I/O requests anymore. In this state, the handling of I/O errors
by new dm I/O requests or by those already in-flight can lead to
a verity corruption state, which is a misjudgment.

So, skip verity work in response to I/O error when system is shutting
down.

Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The definition of IS_ERR() already applies the unlikely() notation
when checking the error status of the passed pointer. For this
reason there is no need to have the same notation outside of
IS_ERR() itself.

Clean up code by removing redundant notation.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
xa_store() may fail, check the result.

Cc: stable@vger.kernel.org # 5.10
Fixes: 0f21220 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
this is one of the cases where we need to use d_real() - we are
using more than the name of dentry here.  ->d_sb is used as well,
so in case of hostfs being used as a layer we get the wrong
superblock.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The table for Unicode upcase conversion requires an order-5 allocation,
which may fail on a highly-fragmented system:

 pool-udisksd: page allocation failure: order:5,
 mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),
 cpuset=/,mems_allowed=0
 CPU: 4 PID: 3756880 Comm: pool-udisksd Tainted: G U
 5.8.10-200.fc32.x86_64 #1
 Hardware name: Dell Inc. XPS 13 9360/0PVG6D, BIOS 2.13.0 11/14/2019
 Call Trace:
  dump_stack+0x6b/0x88
  warn_alloc.cold+0x75/0xd9
  ? _cond_resched+0x16/0x40
  ? __alloc_pages_direct_compact+0x144/0x150
  __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
  ? __schedule+0x28a/0x840
  ? __wait_on_bit_lock+0x92/0xa0
  __alloc_pages_nodemask+0x2df/0x320
  kmalloc_order+0x1b/0x80
  kmalloc_order_trace+0x1d/0xa0
  exfat_create_upcase_table+0x115/0x390 [exfat]
  exfat_fill_super+0x3ef/0x7f0 [exfat]
  ? sget_fc+0x1d0/0x240
  ? exfat_init_fs_context+0x120/0x120 [exfat]
  get_tree_bdev+0x15c/0x250
  vfs_get_tree+0x25/0xb0
  do_mount+0x7c3/0xaf0
  ? copy_mount_options+0xab/0x180
  __x64_sys_mount+0x8e/0xd0
  do_syscall_64+0x4d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Make the driver use kvcalloc() to eliminate the issue.

Fixes: 370e812 ("exfat: add nls operations")
Cc: stable@vger.kernel.org #v5.7+
Signed-off-by: Artem Labazov <123321artyom@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
memblock_enforce_memory_limit accepts the maximum memory size not the
maximum address that can be handled by kernel. Fix the function invocation
accordingly.

Fixes: 1bd14a6 ("RISC-V: Remove any memblock representing unusable memory area")
Cc: stable@vger.kernel.org
Reported-by: Bin Meng <bin.meng@windriver.com>
Tested-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The BOSS GT-1 (USB ID 0582:01d6) requires implicit feedback
like other similar BOSS devices. This patch adds this support.

[ rearranged the table entry in the ID order -- tiwai ]

Signed-off-by: Mike Oliphant <oliphant@nostatic.org>
Link: https://lore.kernel.org/r/20201221215533.2511-1-oliphant@nostatic.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
* acpi-scan:
  ACPI: scan: Add Intel Baytrail Mailbox Device to acpi_ignore_dep_ids
  ACPI: scan: Avoid unnecessary second pass in acpi_bus_scan()
  ACPI: scan: Defer enumeration of devices with _DEP lists
  ACPI: scan: Evaluate _DEP before adding the device

* acpi-pnp:
  ACPI: PNP: compare the string length in the matching_id()

* acpi-sleep:
  ACPI: PM: s2idle: Move x86-specific code to the x86 directory
  ACPI: PM: s2idle: Add AMD support to handle _DSM
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The Quanta NL3 laptop has both a headphone output jack and a headset
jack, on the right edge of the chassis.

The pin information suggests that both of these are at the Front.
The PulseAudio is confused to differentiate them so one of the jack
can neither get the jack sense working nor the audio output.

The ALC269_FIXUP_LIFEBOOK chained with ALC269_FIXUP_QUANTA_MUTE can
help to differentiate 2 jacks and get the 'Auto-Mute Mode' working
correctly.

Signed-off-by: Chris Chiu <chiu@endlessos.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201222150459.9545-1-chiu@endlessos.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This Acer Veriton N4640G/N6640G/N2510G desktops have 2 headphone
jacks(front and rear), and a separate Mic In jack.

The rear headphone jack is actually a line out jack but always silent
while playing audio. The front 'Mic In' also fails the jack sensing.
Apply the ALC269_FIXUP_LIFEBOOK to have all audio jacks to work as
expected.

Signed-off-by: Chris Chiu <chiu@endlessos.org>
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201222150459.9545-2-chiu@endlessos.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
fs/block_dev.c is a pretty integral part of the block layer, so make
sure it is mentioned in MAINTAINERS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no point in duplicating the file name in the top of the file
comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Update copyrights for files that have gotten some major rewrites lately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
… Zen CPUs

Voltages and current are reported by Zen CPUs. However, the means
to do so is undocumented, changes from CPU to CPU, and the raw data
is not calibrated. Calibration information is available, but again
not documented. This results in less than perfect user experience,
up to concerns that loading the driver might possibly damage
the hardware (by reporting out-of range voltages). Effectively
support for reporting voltages and current is not maintainable.
Drop it.

Cc: Artem S. Tashkinov <aros@gmx.com>
Cc: Wei Huang <wei.huang2@amd.com>
Tested-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* pm-cpufreq:
  cpufreq: intel_pstate: Use most recent guaranteed performance values
  cpufreq: intel_pstate: Implement the ->adjust_perf() callback
  cpufreq: Add special-purpose fast-switching callback for drivers
  cpufreq: schedutil: Add util to struct sg_cpu
  cppc_cpufreq: replace per-cpu data array with a list
  cppc_cpufreq: expose information on frequency domains
  cppc_cpufreq: clarify support for coordination types
  cppc_cpufreq: use policy->cpu as driver of frequency setting
  ACPI: processor: fix NONE coordination for domain mapping failure
  ACPI: processor: Drop duplicate setting of shared_cpu_map
When filesystem inconsistency is detected with group locked, we
currently try to modify superblock to store error there without
blocking. However this can cause superblock checksum failures (or
DIF/DIX failure) when the superblock is just being written out.

Make error handling code just store error information in ext4_sb_info
structure and copy it to on-disk superblock only in ext4_commit_super().
In case of error happening with group locked, we just postpone the
superblock flushing to a workqueue.

[ Added fixup so that s_first_error_* does not get updated after
  the file system is remounted.
  Also added fix for syzbot failure.  - Ted ]

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com
Commit cfd7323 ("ext4: add prefetching for block allocation
bitmaps") introduced block bitmap prefetch, and expects to read block
bitmaps of flex_bg through an IO.  However, it seems to ignore the
value range of s_log_groups_per_flex.  In the scenario where the value
of s_log_groups_per_flex is greater than 27, s_mb_prefetch or
s_mb_prefetch_limit will overflow, cause a divide zero exception.

In addition, the logic of calculating nr is also flawed, because the
size of flexbg is fixed during a single mount, but s_mb_prefetch can
be modified, which causes nr to fail to meet the value condition of
[1, flexbg_size].

To solve this problem, we need to set the upper limit of
s_mb_prefetch.  Since we expect to load block bitmaps of a flex_bg
through an IO, we can consider determining a reasonable upper limit
among the IO limit parameters.  After consideration, we chose
BLK_MAX_SEGMENT_SIZE.  This is a good choice to solve divide zero
problem and avoiding performance degradation.

[ Some minor code simplifications to make the changes easy to follow -- TYT ]

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Samuel Liao <samuelliao@tencent.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/1607051143-24508-1-git-send-email-brookxu@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext4_bio_write_page does not need wbc parameter, since its parameter
io contains the io_wbc field. The io::io_wbc is initialized by
ext4_io_submit_init which is called in ext4_writepages and
ext4_writepage functions prior to ext4_bio_write_page.
Therefor, when ext4_bio_write_page is called, wbc info
has already been included in io parameter.

Signed-off-by: Lei Chen <lennychen@tencent.com>
Link: https://lore.kernel.org/r/1607669664-25656-1-git-send-email-lennychen@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
@rata rata changed the title seccomp: Support atomic addfd and return it seccomp: Support atomic "addfd and return" Jan 13, 2021
@rata rata force-pushed the rata/seccomp-atomic-addfd-ret branch from 576362d to 76d468a Compare January 13, 2021 18:46
@rata rata force-pushed the rata/seccomp-linux-5.10-rc5 branch from 3c76276 to 2c07343 Compare January 14, 2021 12:28
@rata rata force-pushed the rata/seccomp-atomic-addfd-ret branch from 76d468a to bd9996f Compare January 14, 2021 12:30
rata added 2 commits January 14, 2021 13:34
Alban Crequy reported a race condition userspace faces when we want to
add an fd and make the same syscall return that value[1].

Currently two different ioctl() calls are needed by the process handling
the syscalls (agent) for another userspace process (target). Therefore,
it is possible for the agent to do the first ioctl to add a file
descriptor but the target is interrupted (EINTR) before the agent does
the second ioctl() call.

This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program, as suggested by
Kees Cook[2]. This is done by simply allowing
seccomp_do_user_notification() to add the fd and return that in this
case. Therefore, in this case the target wakes up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.

The struct seccomp_notif_resp, used when doing SECCOMP_IOCTL_NOTIF_SEND
ioctl() to send a response to the target, has three more fields that we
don't allow to set when doing the addfd ioctl() to also return. The
reason to disallow each field is:
 * val: This will be set to the new allocated fd.
 * error: If this is non-zero, the value is ignored. Therefore,
   it is pointless in this case as we want to return the value.
 * flags: The only flag is to let userspace continue to execute the
   syscall. This seems pointless, as we want the syscall to return the
   allocated fd.

[1]: https://lists.linuxfoundation.org/pipermail/containers/2020-November/042718.html
[2]: https://lists.linuxfoundation.org/pipermail/containers/2020-December/042797.html

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
This just adds a test to verify that when using the new introduced flag
to ADDFD, a valid fd is added and returned as the syscall result.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
@rata rata force-pushed the rata/seccomp-atomic-addfd-ret branch from bd9996f to 2f978a4 Compare January 14, 2021 12:34
@rata rata changed the title seccomp: Support atomic "addfd and return" seccomp: Support atomic "addfd and send" Jan 14, 2021
@rata
Copy link
Author

rata commented Jan 14, 2021

Github is crazy detecting so many changes, will re-open to see if that fixes it

@rata rata closed this Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.