Support async networking#63
Conversation
What major use cases for asynchronous networking do you have in mind? Is this intended to be used by exporters? |
|
Asynchronous networking is more efficient. If the tracer uses synchronous networking, then it could be blocked writing data to a socket even if could move forward with work on other sockets or events. Exporters can use the interface, but they're also welcome to do their own synchronous networking (though that will block the background thread). Additionally, asynchronous networking is important for allowing an application to exit quickly. If you do a synchronous write to an endpoint that's not responding and the app exits, you can hang up the app until the write's timeout is exceeded. |
Yes, I completely agree. I mostly wonder where in the SDK it will be used, apart from exporters. |
|
It could also be used in the sdk to set up the the timers that regularly trigger the span buffer to flush (similarly for metrics) And asynchronous dns resolution can be supported over an interface like this with c-ares. |
| }), | ||
| make_commands = select({ | ||
| "@io_opentelemetry_cpp//bazel:windows": ["MSBuild.exe INSTALL.vcxproj"], | ||
| "//conditions:default" : None, |
There was a problem hiding this comment.
Looks like this file didn't go through buildifier.
There was a problem hiding this comment.
Update format script to run on these files
| /** | ||
| * Run the event loop until. | ||
| */ | ||
| virtual void Run() noexcept = 0; |
There was a problem hiding this comment.
Where is this documented? As a user of the SDK, what do I need to do with the Dispatcher?
There was a problem hiding this comment.
It's intended to support a design where you have a single background thread and asynchronous networking. The background thread would be making successive calls to the systems multiplexing io function (e.g. epoll, kqueue, etc) and invoking callbacks as events are ready.
If you're a user of the SDK and you wanted to use async networking for your vendor exporter, you'd call CreateFileEvent and pass the file descriptors for the sockets you used to connect to your endpoint.
Or if you don't want to do asynchronous io, you can ignore the dispatcher, use synchronous sockets, and block the background thread when you want to do io (less efficient and can hang the process from exiting but supported).
There was a problem hiding this comment.
Can we capture this in a doc please? :)
| t1 = std::chrono::steady_clock::now(); | ||
| dispatcher.Run(); | ||
| auto duration = t2 - t1; | ||
| EXPECT_TRUE(duration > std::chrono::milliseconds{50}); |
There was a problem hiding this comment.
On different CI systems we might need to adjust the tolerance of timing.
Consider making this a configurable thing which we could leverage in other test cases (in particular metrics and async exporter).
There was a problem hiding this comment.
I tried to pick a constant that should be large enough for most environments, though we can probably go larger without slowing down the tests too much.
|
Just needs an autoformat to pass CI. |
| void Dispatcher::Exit() noexcept {} | ||
| void Dispatcher::Exit() noexcept | ||
| { | ||
| running_ = false; |
There was a problem hiding this comment.
Not meant to be threadsafe?
There was a problem hiding this comment.
Yes, meant for interacting with only a single background thread.
| while (running_ && !events_.empty()) | ||
| { | ||
| auto next_event = events_.begin(); | ||
| std::this_thread::sleep_until(next_event->first); |
There was a problem hiding this comment.
How can this sleep be interrupted?
There was a problem hiding this comment.
I'd recommend a short polling interval over an interrupt mechanism.
I'd rather people use the libevent implementation, and this is meant to be a drop-in replacement that provides the same interface and allows you to use the same codebase but without an event library (in case you only care about synchronous networking and don't want to add an external dependency).
When the dispatcher is built on top of libevent, the background thread makes successive calls to the systems polling function (e.g. epoll or an equivalent). Epoll blocks waiting for either a timeout or and event on a list of file descriptors.
I believe it is possible to artificially interrupt epoll if you create a dummy file descriptor for it to include in its list and then close the file descriptor, forcing an event.
But I prefer, and I think it's simpler, to use a polling function with a short timer interval that monitors for termination, closure, and flush events. Basically what's done here
https://github.com/lightstep/lightstep-tracer-cpp/blob/master/src/recorder/stream_recorder/stream_recorder_impl.cpp#L57-L82
There was a problem hiding this comment.
sleep_until may hang "forever" in gcc-4.8 and vs2017 15.4 when system clock changes, i.e. after NTP time sync.
There was a problem hiding this comment.
It uses the steady clock which shouldn't have that problem
There was a problem hiding this comment.
it is possible interrupt epoll
According to man epoll_wait can be interrupted by a signal handler;
you create a dummy file descriptor for it to include in its list and then close the file descriptor, forcing an event.
Codecov Report
@@ Coverage Diff @@
## main #63 +/- ##
==========================================
- Coverage 93.61% 90.70% -2.91%
==========================================
Files 71 85 +14
Lines 1754 1894 +140
==========================================
+ Hits 1642 1718 +76
- Misses 112 176 +64
|
|
Assigning to @lalitb, as discussed in the SIG meeting on October 21st. The main challenge here would be rebasing and making all GH actions pass. |
|
Triaged Dec 7 2020: This is a good change to integrate but non-trivial. The maintainers need to figure out what items are blocked on this. This is not an immediate priority. Will keep open until re-evaluated. |
|
@lalitb do we need rework on this PR to get it merged? |
|
|
Keeping it open, for someone to take a dig at it. |
|
[Code Health] Clang Tidy cleanup, Part 2 (open-telemetry#3038)
Adds an interface for managing async file and timer events within the io polling system calls.
Will support asynchronous networking for tracers and meters.
A concrete implementation is provided with libevent.
The dispatcher interface was derived from Envoy.