Skip to content

Add Lock-free Circular buffer#52

Merged
rnburn merged 16 commits into
open-telemetry:masterfrom
rnburn:circular-buffer
Apr 13, 2020
Merged

Add Lock-free Circular buffer#52
rnburn merged 16 commits into
open-telemetry:masterfrom
rnburn:circular-buffer

Conversation

@rnburn
Copy link
Copy Markdown
Contributor

@rnburn rnburn commented Mar 19, 2020

Adds a lock-free buffer than can be used for spans to ensure that a thread with instrumentation in a heavily concurrent environment doesn't get blocked on a mutex.

Includes a benchmark that compares performance to a baseline circular buffer with a mutex. Here are the results:

2020-03-19 00:21:28
Running bazel-out/host/bin/sdk/test/common/circular_buffer_benchmark
Run on (36 X 3334.92 MHz CPU s)
CPU Caches:
  L1 Data 32K (x18)
  L1 Instruction 32K (x18)
  L2 Unified 1024K (x18)
  L3 Unified 25344K (x1)
Load Average: 13.70, 13.36, 15.09
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_BaselineBuffer/1   11043103 ns        86826 ns          100
BM_BaselineBuffer/2    4852504 ns        80999 ns         1370
BM_BaselineBuffer/4    4862858 ns       132627 ns         1075
BM_LockFreeBuffer/1    3631460 ns        62651 ns         1000
BM_LockFreeBuffer/2    2272364 ns        72407 ns         1000
BM_LockFreeBuffer/4    2516072 ns       125403 ns         1183

@reyang
Copy link
Copy Markdown
Member

reyang commented Mar 19, 2020

@rnburn would you help to clarify where would this buffer get used? I guess in the batch exporter (BatchExporterProcessor) scenario, would like to confirm. Thanks.

@rnburn
Copy link
Copy Markdown
Contributor Author

rnburn commented Mar 19, 2020

When you finish a span, it would get added into a version of this buffer

@reyang
Copy link
Copy Markdown
Member

reyang commented Mar 19, 2020

When you finish a span, it would get added into a version of this buffer

My understanding is that we don't put span into a queue by default. Here goes the spec.
For exporters that are expected to work synchronously, SimpleExporterProcessor will be used and there will be no queue at all.
For exporters that are designed to work asynchronously (e.g. batch exporting to a REST endpoint), BatchExporterProcessor would be used and this is where the queue comes.

@reyang
Copy link
Copy Markdown
Member

reyang commented Mar 19, 2020

Here goes a Python example on how synchronous exporter is used.
In this case, there is no queue, no worker thread; each span end event would trigger a console write and flush immediately.

@rnburn
Copy link
Copy Markdown
Contributor Author

rnburn commented Mar 19, 2020

You will definitely want some kind of buffering for most use cases.

For exporters that are expected to work synchronously, SimpleExporterProcessor will be used and there will be no queue at all.

If the span is processed synchronously without buffering, that will mean possibly blocking the thread that finished the span for an indefinite amount of time. That would be unacceptable for many applications.

But I see buffering in the Python code here.

@reyang
Copy link
Copy Markdown
Member

reyang commented Mar 19, 2020

You will definitely want some kind of buffering for most use cases.
If the span is processed synchronously without buffering, that will mean blocking the thread that finished the span for an indefinite amount of time. That would be unacceptable for many applications.

Depends on the scenario. For people who use LTTng or ETW, they will be shocked if you put in-proc buffer.
For people who work with HTTP/HTTPS exporters, batch exporter is the normal expectation and this is where buffer comes.

But I see buffering in the Python code here.

Yes, and this is specific for BatchExportSpanProcessor. If you use SimpleExportSpanProcessor, there would be no buffer at all.

Comment thread sdk/src/common/atomic_unique_ptr.h Outdated
* Atomically swap the pointer with another.
* @param ptr the pointer to swap with
*/
void Swap(std::unique_ptr<T> &ptr) noexcept { ptr.reset(ptr_.exchange(ptr.release())); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would s/ptr/other/ improve readability?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed


bool Add(std::unique_ptr<T> &&element) noexcept
{
std::lock_guard<std::mutex> lock_gaurd{mutex_};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: gaurd -> guard

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to cover this automatically with #54.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed.

@rnburn
Copy link
Copy Markdown
Contributor Author

rnburn commented Mar 19, 2020

Ok, but this PR isn't about mandating a buffer anywhere.

It only provides a lock-free buffer for the scenarios that do buffering.

Comment thread sdk/src/common/circular_buffer.h Outdated
Co-Authored-By: Reiley Yang <reyang@microsoft.com>
Copy link
Copy Markdown
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rnburn
Copy link
Copy Markdown
Contributor Author

rnburn commented Apr 9, 2020

@g-easy @pyohannes - could you guys take a look? thanks.

Copy link
Copy Markdown
Contributor

@pyohannes pyohannes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I have one small questions that's not critical.

template <class Callback>
void Consume(size_t n, Callback callback) noexcept
{
assert(n <= static_cast<size_t>(head_ - tail_));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended that users of CircularBuffer have to check the buffer size each time before calling Consume? Instead of n being the maximum of elements consumed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not always want to consume the maximum number of elements. For example, you might first call Peek to get a range of elements in the buffer, then call Consume to remove those elements if they were processed successfully. Passing the size is necessary since it's a concurrently multi-producer, single-consumer buffer and elements could have been added between the calls to Peek and Consume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants