Skip to content

SocketAsyncEngine.Unix: Use eventfd(2) to signal shutdown#1538

Closed
lpereira wants to merge 1 commit into
dotnet:masterfrom
lpereira:use-eventfd-to-signal-shutdown
Closed

SocketAsyncEngine.Unix: Use eventfd(2) to signal shutdown#1538
lpereira wants to merge 1 commit into
dotnet:masterfrom
lpereira:use-eventfd-to-signal-shutdown

Conversation

@lpereira
Copy link
Copy Markdown
Contributor

@lpereira lpereira commented Jan 9, 2020

Since we don't really care what is being written to the pipe to wake up the event loop, we can substitute it for an eventfd(2) on Linux and save ourselves a file descriptor per SocketAsyncEngine. (For manycore systems, that's potentially dozens of file descriptors that aren't needed anymore.)

(It's a tiny wee little bit more efficient to write to an eventfd than it is writing to a pipe, although in this case it doesn't matter that much as we're shutting down the engines.)

Also, I don't know C# very well, so the changes in SocketAsyncEngine.RequestEventLoopShutdown() might not be correct.

@lpereira lpereira requested review from stephentoub and tmds January 9, 2020 21:25
@lpereira lpereira force-pushed the use-eventfd-to-signal-shutdown branch from 4c25f28 to 6e6c3ef Compare January 9, 2020 22:26
@lpereira
Copy link
Copy Markdown
Contributor Author

lpereira commented Jan 9, 2020

I don't understand the build error in the Linux x64 CI environment. eventfd is correctly found, so HAVE_EVENTFD is defined to 1, the header file is included, which defines the prototype for eventfd(unsigned int, int). The wrapper here takes a uint32_t and a int32_t and passes it to eventfd(), but the error mentions a sign conversion that isn't happening there (and the only case where I can see that happening is if the prototype isn't defined and the argument types default to int).

Since we don't really care what is being written to the pipe to wake up
the event loop, we can substitute it for an eventfd(2) on Linux and
save ourselves a file descriptor per SocketAsyncEngine.  (For manycore
systems, that's potentially dozens of file descriptors that aren't
needed anymore.)

(It's a tiny wee little bit more efficient to write to an eventfd than
it is writing to a pipe, although in this case it doesn't matter that
much.)
@lpereira lpereira force-pushed the use-eventfd-to-signal-shutdown branch from 6e6c3ef to 4774829 Compare January 10, 2020 00:14
@Wraith2
Copy link
Copy Markdown
Contributor

Wraith2 commented Jan 10, 2020

It could be the return type of eventfd being converted from uint to int on return

//
// Since eventfd is a Linux-only feature, there's no need to define our own
// flag values: pass these values through as the Linux system call expects.
//
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like to treat the shims as platform-agnostic, and the values we pass in here be for the PAL rather than for a particular platform. If the values we define happen to be the same as for that platform, then the PAL implementation can avoid doing any casting and just pass them through, but that's up to the PAL.

What that actually means for your changes:

  • Remove this comment
  • Define these values in pal_io.h
    _ Add static asserts that the PAL EFD_SEMAPHORE matches the target platform's EFD_SEMAPHORE

That will keep the PAL platform-agnostic from a managed code perspective while also not adding runtime overhead.

We do this elsewhere.

int32_t flags); // 0 for defaults or PAL_O_CLOEXEC for close-on-exec

/**
* Creates an eventfd on Linux. Returns error everywhere else.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Creates an eventfd on Linux. Returns error everywhere else.
* Creates an eventfd if available. Returns ENOTSUP error everywhere else.

* Returns 0 for success, -1 for failure. Sets errno on failure.
*/
DLLEXPORT int32_t SystemNative_EventFD(uint32_t initialValue,
int32_t flags); // Passes through to eventfd() without conversion
Copy link
Copy Markdown
Member

@stephentoub stephentoub Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going along with my previous comment:

Suggested change
int32_t flags); // Passes through to eventfd() without conversion
int32_t flags);

int bytesWritten = Interop.Sys.Write(_shutdownWritePipe, &b, 1);
if (bytesWritten != 1)
byte[] wakeThreadUp = {0, 0, 0, 0, 0, 0, 0, 1};
fixed (byte *b = &wakeThreadUp[0])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates an 8-byte array on the heap. That isn't necessary. You can instead make it:

byte* wakeThreadUp = stackalloc byte[8] { 0, 0, 0, 0, 0, 0, 0, 1 };

and not need the fixed at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Still working on the intuition to understand how C# code is generated/executed.

// When using eventfd, read/write ends of the "pipe" have the same file
// descriptor.
//
Interop.Sys.Close((IntPtr)_shutdownReadPipe);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as:

if (_shutdownReadPipe == _shutdownWritePipe)
{
    if (_shutdownReadPipe != -1)
    {
        ...
    }
}

if (_shutdownWritePipe != -1)
{
Interop.Sys.Close((IntPtr)_shutdownWritePipe);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better yet:

if (_shutdownReadPipe != -1)
{
    Interop.Sys.Close((IntPtr)_shutdownReadPipe);
}
if (_shutdownReadPipe != _shutdownWritePipe && _shutdownWritePipe != -1)
{
    Interop.Sys.Close((IntPtr)_shutdownWritePipe);
}

int bytesWritten = Interop.Sys.Write(_shutdownWritePipe, b, 8);
if (bytesWritten != 8)
{
throw new InternalException(bytesWritten);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspect. In general write isn't guaranteed to write everything you pass in, which is why it returns how many bytes it wrote. When we were writing 1 byte, this was fine, because it won't return 0 unless there was an error, but now that it's being called with 8, it could theoretically successfully return 1 through 7. Does something prevent that from being the case when the fd is a pipe or an eventfd, on all platforms we target now and in the future?

Copy link
Copy Markdown
Contributor Author

@lpereira lpereira Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of that. However, for eventfd, this is always fine (it's always ready to accept a 64-bit int). When using pipes, it's very unlikely that the buffer wouldn't have at least 8 bytes free, especially since these pipes are used only during shutdown and this is the only thing that's ever written to them (to make it more robust on Linux the pipe could be opened with O_DIRECT, so read/writes are atomic, but this doesn't work under WSL1 and BSDs AFAICT). So this is fine.

Copy link
Copy Markdown
Member

@stephentoub stephentoub Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very unlikely

I don't think "very unlikely" is sufficient for code that we need to be extremely robust.

I suggest instead doing:

Debug.Assert(bytesWritten == 8);
if (bytesWritten < 1)
{
    throw new InternalException(bytesWritten);
}

or something like that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipe buffers are at least a page in all systems we care about. It's not very unlikely, it's very unlikely.

But, yeah, the proposed change is fine. Even if the assert is removed in release builds, this would fine: the amount of bytes written here (on pipes, where short writes are possible) doesn't really matter as it's only used to wake the thread blocked on poll().

@stephentoub
Copy link
Copy Markdown
Member

We've been considering setting the number of engines to be 1 always instead of scaling with proccount (cc: @VSadov). If we do that, the benefits of this change would be significantly reduced, to saving 1 file descriptor per process rather than N file descriptors per process, and I'm not sure it'll be worth the additional code and complication it brings to the codebase. If we decide to continue scaling, seems fine.

@lpereira
Copy link
Copy Markdown
Contributor Author

We've been considering setting the number of engines to be 1 always instead of scaling with proccount (cc: @VSadov). If we do that, the benefits of this change would be significantly reduced, to saving 1 file descriptor per process rather than N file descriptors per process, and I'm not sure it'll be worth the additional code and complication it brings to the codebase. If we decide to continue scaling, seems fine.

Agreed. If that's what's planned, I can just close this PR then.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Jan 10, 2020

Yes. It is likely that the number of engines will be revised to 1 or some modest number close to that.
We see evidence that less is generally better, except for some extreme cases.

@tmds
Copy link
Copy Markdown
Member

tmds commented Jan 10, 2020

It is likely that the number of engines will be revised to 1 or some modest number close to that.
We see evidence that less is generally better, except for some extreme cases.

@VSadov @stephentoub dotnet/corefx#36693 is where I previously looked at reducing count to 1. Performance was for some benchmarks a bit better. For fortunes with 65k connections there was a clear regression.

With the current implementation, multiple engines are created only when there are at least 6 cores, and existing engines have 32 sockets in use. Probably these systems can spare a file descriptor.

@lpereira
Copy link
Copy Markdown
Contributor Author

Alright, closing this then. Thanks, folks!

@lpereira lpereira closed this Jan 10, 2020
@VSadov
Copy link
Copy Markdown
Member

VSadov commented Jan 10, 2020

@tmds - yes, @stephentoub pointed me to dotnet/corefx#36693 because I started making the same conclusions - 1 poller is nearly always enough.

However, there is also the 65K connection scenario, so this is not so simple.

@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants