Skip to content

Conversation

@thomasameisel
Copy link
Contributor

@thomasameisel thomasameisel commented Jan 20, 2026

Refactor logManager configuration handling to use default configuration directly.

We are seeing a crash that originates in the 1DS SDK. See this issue and this issue for more details on the crash including crash logs. The call stack is crashing when it assigns a C++ object ILogConfiguration to a function-level static property in the ODWLogManager class during static log configuration. I don't think this is needed and by removing it we can simplify this code and avoid a crash.

The issue was that we are creating a new ILogConfiguration object from the config NSDictionary that is passed into this function. Since nothing was holding onto this new ILogConfiguration object it would get deallocated.

Instead we can merge the config NSDictionary into the static log configuration object. This way we don't need to hold onto the new ILogConfiguration object that is created because we are merging into the static log configuration. We also avoid copying the static log configuration object or needing to keep an extra static reference to it.

Refactor logManager configuration handling to use default configuration directly.
@thomasameisel thomasameisel requested a review from a team as a code owner January 20, 2026 18:58
auto& defaultConfig = LogManager::GetLogConfiguration();
logManagerConfig = defaultConfig;
// Get reference to the default configuration
ILogConfiguration& logManagerConfig = LogManager::GetLogConfiguration();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the case, but I'd like to verify that we'll continue holding a strong reference to the default configuration object even if we don't assign it to a static variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogManagerBase::GetLogConfiguration implements this function using a function-level static variable

@thomasameisel
Copy link
Contributor Author

I dug into the history of this crash some more and found this issue. The issue was that we are creating a new ILogConfiguration object from the config NSDictionary that is passed into this function. Since nothing was holding onto this new ILogConfiguration object it would get deallocated.

I just pushed a new commit that merges the config NSDictionary into the static log configuration object. This way we don't need to hold onto the new ILogConfiguration object that is created because we are merging into the static log configuration. We also avoid copying the static log configuration object or needing to keep an extra static reference to it.

@lalitb
Copy link
Contributor

lalitb commented Jan 22, 2026

(Note: I'm reviewing this from a C++ perspective and don't have expertise in Obj-C/iOS specifics.)

I'm having trouble connecting this change to the reported crash. In the Obj-C path, the existing code already uses a static ILogConfiguration, and FromJSON deep-copies string values (Variant stores its own std::string), so there shouldn't be a dangling-string lifetime issue.

This PR changes behavior in a few ways :

  • It mutates the singleton config in place rather than copying defaults into a local static
  • A second call (or concurrent call) will now merge values into the live LogManager's config instead of starting fresh

If the crash was due to dangling pointers, can you point to which config entries are void* in this iOS path? That's the only Variant type that does shallow copy. Without a stack trace or repro case, it's hard to validate that this addresses the root cause. Could you clarify the exact crash and how this change fixes it?

@thomasameisel
Copy link
Contributor Author

thomasameisel commented Jan 22, 2026

(Note: I'm reviewing this from a C++ perspective and don't have expertise in Obj-C/iOS specifics.)

I'm having trouble connecting this change to the reported crash. In the Obj-C path, the existing code already uses a static ILogConfiguration, and FromJSON deep-copies string values (Variant stores its own std::string), so there shouldn't be a dangling-string lifetime issue.

This PR changes behavior in a few ways :

  • It mutates the singleton config in place rather than copying defaults into a local static
  • A second call (or concurrent call) will now merge values into the live LogManager's config instead of starting fresh

LogManagerBase::Initialize does something similar when passing in a configuration object that is a different reference from the default. So while the ODWLogManager behavior is different, I believe the overall static log manager behavior should remain the same.

If the crash was due to dangling pointers, can you point to which config entries are void* in this iOS path? That's the only Variant type that does shallow copy. Without a stack trace or repro case, it's hard to validate that this addresses the root cause. Could you clarify the exact crash and how this change fixes it?

@maxgolov elaborated on the issue with the copy in this comment:

I think the issue is somewhere in copying the const char* values, as in - pointers to values, from temporary. When temporary container is gone, the memory used by container (map) values is reclaimed. Thus, the shallow copy of configuration values in a different spot is now referencing memory that's already been freed / reused / allocated by something else. This is obviously not an issue for older code, where you operated on 'default' configuration object (globally initialized once by static initializer), that is never destroyed. Please keep your incoming configuration object permanent, or static, that should solve it. One way to handle it for your scenario is to use magic static getter for your configuration object - move its populate in a separate method, and return a reference to static. This is thread-safe in C++. Pattern described here: https://blog.mbedded.ninja/programming/languages/c-plus-plus/magic-statics/#singletons

I attached 4 crash logs at the bottom of this issue. From what I can tell the issue is happening in the assignment operator so avoiding the need to use the assignment operator would fix the crash. I'd be happy to iterate on this change if you have another suggestion @lalitb.

@lalitb
Copy link
Contributor

lalitb commented Jan 22, 2026

I attached 4 crash logs at the bottom https://github.com/microsoft/cpp_client_telemetry_modules/issues/315. From what I can tell the issue is happening in the assignment operator so avoiding the need to use the assignment operator would fix the crash. I'd be happy to iterate on this change if you have another suggestion @lalitb.

Looking at a couple of crash traces, they indeed show the same pattern: crash during iteration of ILogConfiguration’s internal VariantMap while copying from GetLogConfiguration(). By the time the copy happens, the map is already corrupted.
This PR removes that copy, which should avoid this immediate crash. It doesn’t address why the map was corrupted. The ObjC surface doesn’t enforce thread-safe access to the shared config:

  • ODWLogConfiguration setters write to GetLogConfiguration() without locking
  • initForTenant:withConfig: has no synchronization guard
  • _initialized isn’t atomic

If multiple queues configure telemetry during startup, they can race on the shared config; the offending thread may have finished by the time we see the crash. Based on code and stack traces (not as an ObjC/iOS expert), adding synchronization in the ObjC layer (e.g., @synchronized or a serial queue) around config access seems like the right way to address the underlying race - worth a sanity check from someone more familiar with this wrapper.

@lalitb
Copy link
Contributor

lalitb commented Jan 22, 2026

Thinking more, I believe instead of fixing the SDK or ObjC wrapper - For the underlying race, the application should ensure serialized access to the SDK during initialization - e.g., configure and initialize from a single thread/queue rather than calling setters and initForTenant: from multiple queues concurrently.

Do you see any such pattern in your application ?

@thomasameisel
Copy link
Contributor Author

thomasameisel commented Jan 22, 2026

Thinking more, I believe instead of fixing the SDK or ObjC wrapper - For the underlying race, the application should ensure serialized access to the SDK during initialization - e.g., configure and initialize from a single thread/queue rather than calling setters and initForTenant: from multiple queues concurrently.

Yes, I've verified through code inspection and setting breakpoints that in iOS Outlook we only configure the static log manager once from a single thread (either the main thread or a serial background queue). We also initialize new log managers and loggers from that same thread.

I'm wondering if the issue has to do with this comment in VariantType.hpp:

    union
    {
        int64_t     iV;
        double      dV;
        const char* sV;
        bool        bV;
        void*       pV;
    };

    // Unfortunately keeping object pointers inside the union above causes issues
    // with C++11 static initializer feature. The pointers get corrupted and calling
    // destructor via delete causes a crash with MSVC compiler (both 2015 and 2017).

Adding threading synchronization (or documentation that this API is not thread-safe) would make sense, but I'm not sure if it's the root cause of this crash.

@maxgolov
Copy link
Contributor

I think your changes look good. But I'd like to share that we hit some very odd bit-packing issues with different compilers in Microsoft Mesh product. Many of these issues were solved in my branch, that was unfortunately not merged in the main. I'd suggest to use Claude Sonnet/Opus 4.5 or GPT-5.2 to review the contents of the branch, and see what changes could be potentially backported to the main, and/or your PR. I apologize for never merging this in the main, as we had a separate repo with the product-specific fixes. I am no longer involved in Observability, you might find some fixes relevant to your cross-platform scenario. We did ran our spinoff of the SDK on Apple Vision Pro, for example.. It was working alright, but our usage patterns were potentially different than yours.

Sorry that I can't be of much help #1122 and #1099 might be relevant. There was also a good comment in one of these PRs that some things are really C++ compiler settings-specific, e.g. packing and alignment. You may want to look closely at Apple Silicon ARM64, if you are hitting some issues specifically on that newer platform.

Hope that is not a noise and hope that helps. Use GitHub Copilot to navigate the changes, as these are quite extensive. But you should be able to grok these now, as these were written before vibe-coding became so prominently available.

@lalitb
Copy link
Contributor

lalitb commented Jan 22, 2026

I'm wondering if the issue has to do with this comment in VariantType.hpp:

That comment is about an old MSVC quirk: putting object pointers in the union caused issues with MSVC’s handling of magic statics and would crash on destruction. On Clang/iOS it doesn’t apply, and the code moved std::string/maps out of the union to avoid it. The crash stacks we’re seeing are during map assignment/initialization, not destruction, so this MSVC note isn’t likely related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants