Skip to content

Work-around for deadlock in DbcCtl instance initialization when loading plugins.#10143

Closed
ywkaras wants to merge 1 commit intoapache:masterfrom
ywkaras:DbgCtl_deadlock2
Closed

Work-around for deadlock in DbcCtl instance initialization when loading plugins.#10143
ywkaras wants to merge 1 commit intoapache:masterfrom
ywkaras:DbgCtl_deadlock2

Conversation

@ywkaras
Copy link
Copy Markdown
Contributor

@ywkaras ywkaras commented Aug 3, 2023

No description provided.

@ywkaras ywkaras self-assigned this Aug 3, 2023
@ywkaras ywkaras added the Debug Support for system debugging label Aug 3, 2023
@ywkaras ywkaras added this to the 10.0.0 milestone Aug 3, 2023
@ywkaras ywkaras linked an issue Aug 3, 2023 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the deadlock, since the DbgCtl lock will be grabbed before the dl_load_lock across the two previously deadlocking threads.

This will block any log messages during the course of loading the plugins, however. Can you explain why you prefer this over your Regex thread_local initialization patch in #10142? As long as we made the Regex type thead_local, it resolved the deadlock by doing the PCRE jit initialization before grabbing the DbgCtl lock.

@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

This should fix the deadlock, since the DbgCtl lock will be grabbed before the dl_load_lock across the two previously deadlocking threads.

This will block any log messages during the course of loading the plugins, however. Can you explain why you prefer this over your Regex thread_local initialization patch in #10142? As long as we made the Regex type thead_local, it resolved the deadlock by doing the PCRE jit initialization before grabbing the DbgCtl lock.

The only purpose of this call: ywkaras@255a910#diff-4e74a9396ab1bce4465550cb337cffd79961132d37cb228db69d67281606f2a4R96
is to trigger a call to get_jit_stack() in Regex.cc. So, it logically makes no difference if r is thread_local. Presumably, making it thread_local just changes the outcome of the race condition when the Au tests are run, which prevents the problem from manifesting.

@cmcfarlen
Copy link
Copy Markdown
Contributor

This should fix the deadlock, since the DbgCtl lock will be grabbed before the dl_load_lock across the two previously deadlocking threads.

This will block any log messages during the course of loading the plugins, however. Can you explain why you prefer this over your Regex thread_local initialization patch in #10142? As long as we made the Regex type thead_local, it resolved the deadlock by doing the PCRE jit initialization before grabbing the DbgCtl lock.

It would only block Debug macros the first time they are encountered in a codepath. Places using Dbg or any subsequent encounters of Debug macros would not need to hold this lock as it's only necessary to setup the DbgCtl instance, not to log a message.

I think this workaround is only necessary until everything uses Dbg vs Debug right @ywkaras? If all of the tags are setup at static initializer time, then there shouldn't be any deadlocking, even for dlopen static initializers.

@bneradt
Copy link
Copy Markdown
Contributor

bneradt commented Aug 3, 2023

I verified in a vm in CI that I no longer see the deadlock with this patch applied. Thanks for working on this, @ywkaras .

@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

I think we should retain the work-around permanently. DbgCtl instances may be local. And DbgCtl::update() also locks the registry mutex.

@cmcfarlen
Copy link
Copy Markdown
Contributor

Another possible solution to this could be to have the regex matcher that is used be one that either doesn't use a jit stack or has its own dedicated jit stack (avoiding accessing the thread local). This would consolidate the fix to just Dbg and avoid the incidental complexity around dlopen. Perhaps we could add a flag or option to Regex::compile to support this. Thoughts @ywkaras @bneradt?

@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

Another possible solution to this could be to have the regex matcher that is used be one that either doesn't use a jit stack or has its own dedicated jit stack (avoiding accessing the thread local). This would consolidate the fix to just Dbg and avoid the incidental complexity around dlopen. Perhaps we could add a flag or option to Regex::compile to support this. Thoughts @ywkaras @bneradt?

Yes, that's probably better. I'm not well-versed in the PCRE lib. It looks like the JIT stack is only used to compile. Or, does the compiled pattern refer to it, and use it for matching? Probably not, since a compiled regex could be accessed from multiple threads.

@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

Actually, looks like we're probably using PCRE wrong. I think each compiled regex needs mutex protection and it's own JIT stack. Either that, or a single global mutex for all regexes, if they're sharing a JIT stack.

@zwoop
Copy link
Copy Markdown
Contributor

zwoop commented Aug 3, 2023

We should also consider implications of moving to PCRE2 for ATS v10. I don't know if that helps (or makes things worse here), but we really need to get to PCRE2 ASAP.

Also, while we're talking, can we not remove the old Debug() macro in ATS v10? I think the implications are small, migrating non-core plugins to the new API is easy. The core ought to all already be moved to the new API, if not, shame on Walt.

@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

False alarm, Regex is fine. get_jit_stack() is a callback, called each time Regex::exec() is called. So it will return the dedicated stack for the thread exec() is running on.

@ywkaras ywkaras marked this pull request as draft August 3, 2023 23:41
@ywkaras
Copy link
Copy Markdown
Contributor Author

ywkaras commented Aug 3, 2023

Shame on me for not sending commandos to get my PRs reviewed at gunpoint? My employer doesn't have a big commando budget like some people's.

@ywkaras ywkaras closed this Aug 7, 2023
@zwoop zwoop removed this from the 10.0.0 milestone Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Debug Support for system debugging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock between DbgCtl and PCRE

4 participants