Conversation
|
Hmm, this is tricky, because it adds un-DCEable code that cannot be optimized away. How about instead:
That way we would not build more unoptimizable runtime bits. I know the increase in this PR does not add much, but it feels to me that we should strongly prefer to add only DCEable runtime elements. Or I suppose if the references to Basically the pthread pool is intended to be an emulation setting for use cases where sync pthread_create() is absolutely required, so I would not want to see all users have to pay bytes for that emulation setting. |
|
Fair point about code size. I feel it's a little clunky to need to set an option in order to get I did consider having special values for "number of cores" or such, but there are going to be more general cases that that doesn't handle, so the Module option seems best. |
| addOnPreRun(function() { | ||
| if (typeof SharedArrayBuffer !== 'undefined') { | ||
| addRunDependency('pthreads'); | ||
| PThread.allocateUnusedWorkers(pthreadPoolSize, function() { |
There was a problem hiding this comment.
Uh hmm we have now both PThread.allocateUnusedWorkers and PThread.createNewWorkers(pthreadPoolSize);, that seems very wrong. To me this looks like we are allocating 2x the amount of Workers that are needed? Or perhaps I am very confused here. The intent of #9394 where this came into existence does not make sense to me. hmm...
There was a problem hiding this comment.
I believe the idea is that creating and instantiating the workers is a separate thing. One can create the Worker objects without blocking startup on being able to instantiate them with the info they require etc. But we can discuss in your comments over there.
However, all of that is separate from this PR, and should not block it, unless I'm missing something?
There was a problem hiding this comment.
Perhaps this PR can be relayered on top of https://github.com/emscripten-core/emscripten/pull/10269/files#diff-db41bea94577c2dd9b0eef0308b06cf9R36 ?
There was a problem hiding this comment.
Maybe we can relayer on top of that, but that one seems to need discussion (so we figure out the history there) so unclear how long it will take to land, while this one is ready to go, isn't it? 😄 Or is there some reason you'd prefer to delay landing this?
| if (!ENVIRONMENT_IS_PTHREAD) addOnPreRun(function() { if (typeof SharedArrayBuffer !== 'undefined') { addRunDependency('pthreads'); PThread.allocateUnusedWorkers({{{PTHREAD_POOL_SIZE}}}, function() { removeRunDependency('pthreads'); }); }}); | ||
| #if USE_PTHREADS && PTHREAD_POOL_SIZE | ||
|
|
||
| var pthreadPoolSize = Module['pthreadPoolSize'] || {{{ PTHREAD_POOL_SIZE }}}; |
There was a problem hiding this comment.
If one wanted to set Module['pthreadPoolSize'] = 0; to disable pthread pool, this has the effect of using {{{PTHREAD_POOL_SIZE}}} instead?
There was a problem hiding this comment.
Oh, good point. Perhaps this should be more careful then.
Another thought I had meanwhile, if we're considering delaying this PR anyhow as discussed above - perhaps PTHREAD_POOL_SIZE could be a string - then the user could just set it to navigator.hardwareConcurrency directly, what do you think?
There was a problem hiding this comment.
I like that idea - they could then set it to Module['pthreadPoolSize'] as well to customize.
| #endif | ||
| PThread.preallocatedWorkers = PThread.createNewWorkers(pthreadPoolSize); | ||
| } | ||
| #endif // PTHREAD_POOL_SIZE |
There was a problem hiding this comment.
I am not sure why this code block even exists.. this seems to double allocate a new pool of Workers that has already been created in the code below in src/preamble.js?
|
Ok, rewritten to simply make This makes |
Without the linker option `PTHREAD_POOL_SIZE` set, the wasm backend does not create any webworkers and spins forever waiting for them to be created (crashing the browser). This PR sets `PTHREAD_POOL_SIZE` to `Math.min(4, Math.max(1, (navigator.hardwareConcurrency || 1) / 2))`, meaning it will use half of the logical cores available up to a max of 4 (or only 1 if `navigator.hardwareConcurrency` is not available). See [this comment](emscripten-core/emscripten#10263 (comment)) for details on why this works. This option apparently supersedes the value passed to [`pthreadpool_create` in `backend.cc`](https://github.com/tensorflow/tfjs/blob/c9dfebddfa34f531d2f0c363b6ea574a9fe9745d/tfjs-backend-wasm/src/cc/backend.cc#L65-L66), but its current setting should match the value computed in the `.cc` file. Fixes #4932 This PR also fixes a bug where webworkers created by the wasm backend were not removed when `dispose` is called. Fixes #4796. Fixes #4934 as well.
This does the same as
PTHREAD_POOL_SIZEbut can be specifiedat runtime. While the compile-time setting is great for testing, in
practice I think the Module option is probably more useful, since lots
of projects want to use a pool equal to the number of actual cores
on the user's machine.
fixes #10231