Skip to content

Conversation

@nickrmc83
Copy link

The changes in this PR prevent close(2) from being called multiple times against a single file descriptor. On Linux, but not necessarily other posix boxen, calling close() is considered "wrong" (their words):

Retrying the close() after a failure return is the wrong thing to
do, since this may cause a reused file descriptor from another
thread to be closed. This can occur because the Linux kernel
always releases the file descriptor early in the close operation,
freeing it for reuse; the steps that may return an error, such as
flushing data to the filesystem or device, occur only later in the
close operation.

The PR changes here provide a speculative fix to issues such as this one in osquery. The error from RocksDB in osquery is EISDIR, an unexpected error in this context (see log). This has the distinct smell of an incorrect file descriptor in use. I hypothesize this is down to a call to ::Close(...) returning an IOStatus error and not setting fd to -1 and then subsequently, the destructors for PosixRandomRWFile and/or PosixDirectory, invoking close() on the now invalid file descriptor. Between the first and second call of close(), the file descriptor id can be recycled and used in a different context. This is described in open(2) which states:

The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.

The erroneous call to close() in the destructor could therefore impact reused file descriptor ids thus leading to unexpected and difficult to debug error cases.

The changes in this PR prevent [close(2)](https://man7.org/linux/man-pages/man2/close.2.html)
from being called multiple times against a single file descriptor. On Linux, but not necessarily
other posix boxen, calling `close()` is considered "wrong" (their words):

> Retrying the close() after a failure return is the wrong thing to
  do, since this may cause a reused file descriptor from another
  thread to be closed.  This can occur because the Linux kernel
  always releases the file descriptor early in the close operation,
  freeing it for reuse; the steps that may return an error, such as
  flushing data to the filesystem or device, occur only later in the
  close operation.

The PR changes here provide a speculative fix to issues such as [this one](osquery/osquery#7884) in osquery. The error from RocksDB in osquery is EISDIR,
an unexpected error in this context (see log). This has the distinct smell of an incorrect file descriptor in use. I hypothesize this is down to a call to ::Close(...) returning an IOStatus error and not setting fd to -1 and then subsequently, the destructors for `PosixRandomRWFile` and/or `PosixDirectory`, invoking close() on the now invalid file descriptor. Between the first and second call of close(), the file descriptor id can be recycled and used in a different context.
This is described in [open(2)](https://man7.org/linux/man-pages/man2/open.2.html) which states:

> The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.

The erroneous call to close() in the destructor **could** therefore impact reused file descriptor ids thus leading to unexpected and difficult to debug error cases.
@meta-cla
Copy link

meta-cla bot commented Nov 5, 2025

Hi @nickrmc83!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla
Copy link

meta-cla bot commented Nov 16, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant