Skip to content

Remove static global context_handler_ variable#322

Merged
reyang merged 13 commits into
open-telemetry:masterfrom
pyohannes:global-context
Sep 21, 2020
Merged

Remove static global context_handler_ variable#322
reyang merged 13 commits into
open-telemetry:masterfrom
pyohannes:global-context

Conversation

@pyohannes
Copy link
Copy Markdown
Contributor

The main intention of this PR is to remove the static global context_handler_ variable, which caused linking problems in several instances, as this would imply a need to have the context_handler_ symbol present in the translation units linked, which conflicts with the header-only API approach.

To fix this, I added a singleton approach similar to what we have in place with the TracerProvider.

While working on that, I always cleaned things up a bit and I separated the RuntimeContext into RuntimeContext and a RuntimeContextStorage. The RuntimeContextStorage is the part that needs to be implemented for providing a custom context management strategy. I think having this separation in place reduces complexity.

Closes #321.

@pyohannes pyohannes requested a review from a team September 4, 2020 16:40
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 4, 2020

Codecov Report

Merging #322 into master will increase coverage by 0.08%.
The diff coverage is 96.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #322      +/-   ##
==========================================
+ Coverage   94.62%   94.70%   +0.08%     
==========================================
  Files         148      148              
  Lines        6655     6729      +74     
==========================================
+ Hits         6297     6373      +76     
+ Misses        358      356       -2     
Impacted Files Coverage Δ
api/test/context/runtime_context_test.cc 100.00% <ø> (ø)
api/test/plugin/dynamic_load_test.cc 100.00% <ø> (ø)
api/test/trace/scope_test.cc 100.00% <ø> (ø)
exporters/ostream/test/ostream_span_test.cc 100.00% <ø> (ø)
ext/test/zpages/threadsafe_span_data_test.cc 100.00% <ø> (ø)
ext/test/zpages/tracez_data_aggregator_test.cc 97.34% <ø> (ø)
ext/test/zpages/tracez_processor_test.cc 98.70% <ø> (ø)
sdk/test/trace/always_off_sampler_test.cc 100.00% <ø> (ø)
sdk/test/trace/always_on_sampler_test.cc 100.00% <ø> (ø)
sdk/test/trace/attribute_utils_test.cc 100.00% <ø> (ø)
... and 10 more

*
* @param storage a custom runtime context storage
*/
static void SetRuntimeContextStorage(nostd::shared_ptr<RuntimeContextStorage> storage) noexcept
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method typically gonna be the very first method called before any telemetry is emitted by the process?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say. I think it is, but I'm not sure about all the different scenarios.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method typically gonna be the very first method called before any telemetry is emitted by the process?

I documented that this method has to be called before any spans are created.

private:
static nostd::shared_ptr<RuntimeContextStorage> GetRuntimeContextStorage() noexcept
{
while (GetLock().test_and_set(std::memory_order_acquire))
Copy link
Copy Markdown
Contributor

@maxgolov maxgolov Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the purpose of the lock here:

  • line 178 - &GetStorage() is already atomic, since static at line 180 is magic static aka function-local static. Since C++11, the initialization of magic statics is guaranteed to be thread safe...

  • whereas storage at line 172 is on stack...

Could you please clarify - what is the race we are trying to address? Is this a race between getting storage for the first time vs. SetRuntimeContextStorage potentially being called at the same time?

I was thinking about slightly less elegant but shorter solution.. It won't require a lock if we declare that the context storage customization MUST be done before any other telemetry API call:
https://github.com/maxgolov/opentelemetry-cpp/blob/4004c2d1ebc6bd9b1e55eaf9310726fc939f9de5/api/include/opentelemetry/context/runtime_context.h#L28

(sorry, I didn't refactor it much.. it's somewhat similar to previous implementation - same method names as before... I did this in my fork to alleviate the issue).

I like your solution overall. Just have doubts if we really need the locking here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race is between GetRuntimeContextStorage and SetRuntimeContextStorage. Without the lock, we might partially overwrite the shared_ptr, thus scrambling up the counter and causing memory corruption.

I went the safer way here. But I'm fine with adding a requirement that context storage customization MUST be done before any other telemetry API call. Then we can remove the lock and win a tiny bit of performance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. there is race condition and possible corruption if GetRuntimeContextStorage and SetRuntimeContextStorage is called simultaneously as line 163 is not atomic operation. Although this is tiny bit of performance overhead, it can become significant if GetRuntimeContextStorage get's called for each span processing. Adding a requirement for context storage initialization before telemetry api calls would be good in that case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the locking around GetRuntimeContextStorage and SetRuntimeContextStorage. I added a note to SetRuntimeContextStorage that the behavior is undefined when it it's called after any spans have been created.

*/
static void SetRuntimeContextStorage(nostd::shared_ptr<RuntimeContextStorage> storage) noexcept
{
while (GetLock().test_and_set(std::memory_order_acquire))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap GetLock().test_and_set() and GetLock().clear() into an object ctor and dtor to make sure clear() is always called here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll wait for the result of the discussion whether we'll need this lock at all. If so, I can wrap the calls in some kind of lock guard.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the locks altogether.

Comment thread api/include/opentelemetry/context/runtime_context.h
// Pops the top Context off the stack and returns it.
Context Pop() noexcept
{
if (size_ <= 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t is unsigned, so just check size_ == 0 here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
int index = size_ - 1;
size_--;
return base_[index];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate to return base_[--size_];?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size_ is of size_t type, then should we check before we decrement? (or is it going to be too much of a paranoid check?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the code. The check is already there (a few lines above).


static RuntimeContextStorage *GetDefaultStorage() noexcept
{
return new ThreadLocalContextStorage();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether it is necessary to new ThreadLocalContextStorage() at every call, or just keep one global instance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That call is used to initialize the one global instance. It's only called once to initialize a static variable.

@pyohannes pyohannes added the pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.) label Sep 14, 2020
@pyohannes
Copy link
Copy Markdown
Contributor Author

I removed all locks around the RuntimeContextStorage getter and setter, and I documented that setting a RuntimeContextStorage after spans have already been created may result in undefined behavior.

This is ready for another reviewround @open-telemetry/cpp-approvers.

@reyang reyang removed the pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.) label Sep 18, 2020
@reyang reyang added the pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.) label Sep 21, 2020
@reyang reyang merged commit a39b9b3 into open-telemetry:master Sep 21, 2020
GerHobbelt pushed a commit to GerHobbelt/opentelemetry-cpp that referenced this pull request Jun 17, 2025
…what_you_use-0.x

Update dependency depend_on_what_you_use to v0.9.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:please-merge This PR is ready to be merged by a Maintainer (rebased, CI passed, has enough valid approvals, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace static RuntimeContext::context_handler_ global in the API

5 participants