Skip to content

Fix misues of is_main_runtime_thread over is_main_browser_thread#15503

Merged
sbc100 merged 2 commits intomainfrom
fix_is_main
Nov 16, 2021
Merged

Fix misues of is_main_runtime_thread over is_main_browser_thread#15503
sbc100 merged 2 commits intomainfrom
fix_is_main

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Nov 12, 2021

In all these places the code is really attempting to figure out if it is
the main runtime thread. i.e. the first place the program is loaded and
the place that runs the callback and async events send from secondary
threads.

The reason this mistake often goes unnoticed is that in almost all cases
the main runtime thread is also running on the main browser thread.

One easy way to see that is_main_browser_thread is the wrong question
to be asking in many of these cases is to remember that when emscripten
is started in a worker there is no main browser involved and so this
function will return false on all threads.

In the case of __timedwait.c and pthread_barrier_wait.c the desire
is to avoid blocking the main runtime thread so that calls from other
threads can be processed by emscripten_main_thread_process_queued_calls.
emscripten_main_thread_process_queued_calls is expected to always run
on the main runtime thread, and not necessarily on the main browser
thread. Indeed its first line is:

assert(emscripten_is_main_runtime_thread());

@sbc100 sbc100 requested review from juj, kleisauke and kripken November 12, 2021 02:42
@kripken
Copy link
Member

kripken commented Nov 12, 2021

This is probably a valid change, but just to be sure we're covering the other aspect here: the literal main browser thread is special in that we can't block there, and a lot of the "if main thread then do complex thing" is to work around that. If the main runtime thread is not the main browser thread then we don't need those hacks?

But OTOH the main runtime thread is special in that lots of stuff ends up proxied to it, so perhaps we do want to not block there either.

I think this can make sense to do, but given the above I think it is sort of a policy change as well. If so, the changelog might be worth updating.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 13, 2021

This is probably a valid change, but just to be sure we're covering the other aspect here: the literal main browser thread is special in that we can't block there, and a lot of the "if main thread then do complex thing" is to work around that. If the main runtime thread is not the main browser thread then we don't need those hacks?

But OTOH the main runtime thread is special in that lots of stuff ends up proxied to it, so perhaps we do want to not block there either.

I think this can make sense to do, but given the above I think it is sort of a policy change as well. If so, the changelog might be worth updating.

I don't think its a policy change, but more of a bug fix.

In both these busy loops we call emscripten_main_thread_process_queued_calls which only works on the main runtime thread, so the expectation is clearly the its the main runtime thread.

If there was a case where emscripten_is_main_browser_thread was true but emscripten_is_main_runtime_thread then we would hit the assertion at the top of emscripten_main_thread_process_queued_calls.

Is it ever possible to run emscripten code on the main browser thread that is not also the main runtime thread? I think not right, since there would be now way to start a thread on the main browser thread if the application itself was started in a worker.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 13, 2021

Let me try make the argument another way, there are currently four different places where we avoid blocking so we can call emscripten_main_thread_process_queued_calls in a loop

  1. __timedwait.c:

int is_main_thread = emscripten_is_main_browser_thread();

Here we are using emscripten_is_main_browser_thread, I believe in error.

  1. __wait.c:

int is_main_thread = emscripten_is_main_runtime_thread();

Here we are using emscripten_is_main_runtime_thread, I believe correctly.

  1. pthread_barrier_wait.c:

int is_main_thread = emscripten_is_main_browser_thread();

Here we are using emscripten_is_main_browser_thread, I believe in error.

  1. __pthread_join_js:

if (!ENVIRONMENT_IS_PTHREAD) _emscripten_main_thread_process_queued_calls();

Here we are using if (!ENVIRONMENT_IS_PTHREAD) which is the same as emscripten_is_main_runtime_thread... so I believe this is correct.

I believe that in all of these cases the busy loop is intended for the runtime thread, but in two of them we mistakenly check for the browser thread. The net result is that program that start in a worker will not work.. but since that is not common we have not noticed this bug.

@kleisauke
Copy link
Collaborator

Is it ever possible to run emscripten code on the main browser thread that is not also the main runtime thread?

Perhaps the main runtime thread of Node.js counts as non-main browser thread? IIUC, blocking on the main runtime thread in Node.js is fine, see for example this check:

// We can do a normal blocking wait anywhere but on the main browser thread.
if (!ENVIRONMENT_IS_WEB) {

Currently however, both emscripten_is_main_runtime_thread() and emscripten_is_main_browser_thread() are the same on Node.js.

#include <stdio.h>
#include <emscripten/threading.h>

int main() {
    printf("emscripten_is_main_runtime_thread() = %d\n", emscripten_is_main_runtime_thread());
    printf("emscripten_is_main_browser_thread() = %d\n", emscripten_is_main_browser_thread());
    return 0;
}
$ emcc test.c -o test.js
$ node test.js
emscripten_is_main_runtime_thread() = 1
emscripten_is_main_browser_thread() = 1

(I've tried to change this behavior in the past with commit kleisauke@36236cf but unfortunately it didn't work out)

@juj
Copy link
Collaborator

juj commented Nov 15, 2021

The net result is that program that start in a worker will not work..

The history here is that running pthreads enabled builds in PROXY_TO_WORKER mode (or manually launching an Emscripten pthreads compiled app in a Worker) has not been supported.

To recall,

  • main_browser thread is always the main browser environment,
  • main_runtime thread is the thread that contains the canonical copy of the JS side global state of the program (e.g. contents of MEMFS). In particular main_runtime thread is not necessarily the thread that launches user's main() function.

main_browser thread cannot Atomics.wait(), so it must busy spin hack.
main_runtime thread must stay responsive to process incoming proxy requests for e.g. filesystem access.

Very originally in ~2015 or so when this was written, pthreads build mode had its own proxy-to-worker mode that was a bit similar to PROXY_TO_WORKER, but implemented independently to avoid complicating the existing non-pthreads worker builds mode. That original build mode would have main runtime thread be different than the main runtime thread. That mode cared about distinguishing main_browser & main_runtime threads, which would be separate threads.

That mode was some time after, 2015-1016 dropped and replaced with the current PROXY_TO_PTHREAD build mode as it is considerably simpler (just one pass-through main() wrapper to compile in between the code). In that build mode, the main_browser thread will be the same as the main_runtime thread, but the main_browser thread will only launch user main() in a manually created pthread. Hence user main() will not be running on the main_runtime thread (nor on the main_browser) thread. In that build mode, main_browser & main_runtime threads are still always the same (main browser) thread.

In 2019-2020 or so Alon commented that pthreads + PROXY_TO_WORKER should be supported, so that support was opened up. However there are limitations to that, e.g. it requires nested Worker support, and causes limitations to MAIN_THREAD_EM_ASM and other proxying architecture, since those use postMessage(), they will always post to their parent thread, which in PROXY_TO_WORKER builds will not be the main_browser thread, but the proxy owning Worker thread. MAIN_THREAD_EM_ASM() would be more precisely called MAIN_RUNTIME_THREAD_EM_ASM(), which might not be the main browser thread in this kind of complex PROXY_TO_WORKER build mode.

This is probably a valid change, but just to be sure we're covering the other aspect here: the literal main browser thread is special in that we can't block there, and a lot of the "if main thread then do complex thing" is to work around that. If the main runtime thread is not the main browser thread then we don't need those hacks?

There are two reasons that the spinloop exists there: main_browser thread cannot Atomics.wait(), so it must fully busyspin, and main_runtime thread must stay responsive to process incoming proxy requests, so it can at best sleep in small slices.

I don't think its a policy change, but more of a bug fix.

Agreed. This is a fallout of the relaxed pthreads + PROXY_TO_WORKER build mode compatibility. If we did not support that, the concepts of the two threads main_browser and main_runtime could be fused just to one.

Given the subtlety here, it would be good to have a test for this in the suite if possible?

Is it ever possible to run emscripten code on the main browser thread that is not also the main runtime thread? I think not right, since there would be now way to start a thread on the main browser thread if the application itself was started in a worker.

This does not sound like something we need to support. If user is doing this kind of more complex pthreads + PROXY_TO_WORKER build, we can assume they are sensible, and don't then try to post the module back to main browser thread to attempt to have that act as some kind of extra worker thread that would not be the main runtime thread.

So emscripten_is_main_browser_thread() being true does imply emscripten_is_main_runtime_thread() being true.

Perhaps the main runtime thread of Node.js counts as non-main browser thread? IIUC, blocking on the main runtime thread in Node.js is fine,

I haven't closely followed the Node.js threading side, though if Node.js differs from the browser by allowing blocking on the main Node thread, then that would mean that we can Atomics.wait() to sleep there, but will still need to wait at most small time slices on the runtime thread to stay responsive to handle incoming proxy requests.

#ifdef __EMSCRIPTEN__
double msecsToSleep = top ? (top->tv_sec * 1000 + top->tv_nsec / 1000000.0) : INFINITY;
int is_main_thread = emscripten_is_main_browser_thread();
int is_main_thread = emscripten_is_main_runtime_thread();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be most accurate, this code should read

int can_futex_wait_on_current_thread = !emscripten_is_main_browser_thread() || current_environment_is_node/*node.js main thread can do Atomics.wait() */;
int is_main_runtime_thread = emscripten_is_main_runtime_thread();

...

          if ((!can_futex_wait_on_current_thread || is_main_runtime_thread) || ...) {

                       ...

			// Assist other threads by executing proxied operations that are effectively singlethreaded.
                        if (is_main_runtime_thread) emscripten_main_thread_process_queued_calls();
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should be necessary because the emscripten_futex_wait is already safe for all on the main thread and will busy look in a tight loop. There should be no need to also have this outer busy loop that calls emscripten_futex_wait with smaller values.

In fact, as a followup I think we can remove the is_main_thread part of this condition completely, but I'll leave that as a followup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way of putting it: both code paths here call emscripten_futex_wait... avoiding calling emscripten_futex_wait is not what this code does. This code instead relies on being able to call emscripten_futex_wait regardless.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good observation - I did not recall that emscripten_futex_wait was already guarded for main thread to be able to call it. lgtm then.

a_inc(&inst->finished);
#ifdef __EMSCRIPTEN__
int is_main_thread = emscripten_is_main_browser_thread();
int is_main_thread = emscripten_is_main_runtime_thread();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here likewise to the above code.


// main thread may need to run proxied calls, so sleep in very small slices to be responsive.
const double maxMsecsSliceToSleep = emscripten_is_main_browser_thread() ? 1 : 100;
const double maxMsecsSliceToSleep = emscripten_is_main_runtime_thread() ? 1 : 100;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - might be good to change the comment to say // main runtime thread may need to ...


// Ensure that files are preloaded from the main thread.
assert(emscripten_is_main_browser_thread());
assert(emscripten_is_main_runtime_thread());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two wasmfs changes lgtm.

In all these places the code is really attempting to figure out if it is
the main runtime thread.  i.e. the first place the program is loaded and
the place that runs the callback and async events send from secondary
threads.

The reason this mistake often goes unnoticed is that in almost all cases
the main runtime thread is also running on the main browser thread.

One easy way to see that `is_main_browser_thread` is the wrong question
to be asking in many of these cases is to remember that when emscripten
is started in a worker there is no main browser involved and so this
function will return false on *all* threads.

In the cast of `__timedwait.c` and `pthread_barrier_wait.c` the desire
is to avoid blocking the main runtime threads so that calls from other
threads can be processed by `emscripten_main_thread_process_queued_calls`.
`emscripten_main_thread_process_queued_calls` is expected to always run
on the main runtime thread, and not necessarily on the main browser
thread.  Indeed its first line is:

 `assert(emscripten_is_main_runtime_thread());`
@sbc100 sbc100 merged commit b347ced into main Nov 16, 2021
@sbc100 sbc100 deleted the fix_is_main branch November 16, 2021 17:10
mmarczell-graphisoft pushed a commit to GRAPHISOFT/emscripten that referenced this pull request Jan 5, 2022
…mscripten-core#15503)

In all these places the code is really attempting to figure out if it is
the main runtime thread.  i.e. the first place the program is loaded and
the place that runs the callback and async events send from secondary
threads.

The reason this mistake often goes unnoticed is that in almost all cases
the main runtime thread is also running on the main browser thread.

One easy way to see that `is_main_browser_thread` is the wrong question
to be asking in many of these cases is to remember that when emscripten
is started in a worker there is no main browser involved and so this
function will return false on *all* threads.

In the cast of `__timedwait.c` and `pthread_barrier_wait.c` the desire
is to avoid blocking the main runtime threads so that calls from other
threads can be processed by `emscripten_main_thread_process_queued_calls`.
`emscripten_main_thread_process_queued_calls` is expected to always run
on the main runtime thread, and not necessarily on the main browser
thread.  Indeed its first line is:

 `assert(emscripten_is_main_runtime_thread());`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants