Skip to content

assert in mono_marshal_ilgen_init #83804

@tmds

Description

@tmds

We're investigating some crashes we see when source-building .NET on a heavy ppc64le machine (100+ CPU, 100+GB RAM).

The stacktrace has the following:

Thread 8 (Thread 0x20001300f080 (LWP 152780) ".NET ThreadPool"):
#0  0x00002000006142b4 in wait4 () from /lib64/libc.so.6
#1  0x000020000061411c in waitpid () from /lib64/libc.so.6
#2  0x0000200000b6fa50 in mono_dump_native_crash_info () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#3  0x0000200000b1efac in mono_handle_native_crash () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#4  0x0000200000b6edd8 in sigabrt_signal_handler () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#5  <signal handler called>
#6  0x00002000005b30cc in __pthread_kill_implementation () from /lib64/libc.so.6
#7  0x000020000055223c in raise () from /lib64/libc.so.6
#8  0x000020000052c70c in abort () from /lib64/libc.so.6
#9  0x0000200000bb6248 in monoeg_assert_abort () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#10 0x00002000009e07d4 in mono_log_write_logfile () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#11 0x00002000009dbb44 in structured_log_adapter () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#12 0x0000200000bb67fc in monoeg_g_logv_nofree () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#13 0x0000200000bb6960 in monoeg_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#16 0x00002000009c3ad8 in mono_emit_marshal_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#17 0x0000200000926db0 in mono_emit_marshal () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#18 0x00002000009cc074 in emit_native_wrapper_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#19 0x0000200000928060 in mono_marshal_get_native_wrapper () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#20 0x0000200000a73cdc in mono_jit_compile_method_with_opt () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#21 0x0000200000a6d134 in mono_jit_compile_method () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#22 0x0000200000b22144 in common_call_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#23 0x0000200000b21c38 in mono_magic_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...

The interesting part is:

...
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...

Though there are no line numbers, after spelunking through the code, I think we may be hitting the assert on the first line of this function:

void
mono_install_marshal_callbacks_ilgen (MonoMarshalIlgenCallbacks *cb)
{
g_assert (!ilgen_cb_inited);
g_assert (cb->version == MONO_MARSHAL_CALLBACKS_VERSION);
memcpy (&ilgen_marshal_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
ilgen_cb_inited = TRUE;
}

I imagine this may happen when multiple threads call mono_marshal_ilgen_init, which is more likely on a machine with many cores? Or is there something that ensures there is only a single thread performs the initialization?

cc @lambdageek @omajid @Swapnali911

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions