Skip to content

Inline pthread worker.js file into the main output file#21701

Merged
sbc100 merged 1 commit intoemscripten-core:mainfrom
sbc100:pthread_inline
Apr 16, 2024
Merged

Inline pthread worker.js file into the main output file#21701
sbc100 merged 1 commit intoemscripten-core:mainfrom
sbc100:pthread_inline

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Apr 5, 2024

This method has many advantages over the previous method of generating a separate file:

  • Avoids separate output file simplifying deployment.
  • Avoids confusing laying of global scopes
  • Avoids exporting symbols on the Module simply for visibility within the worker file.
  • Avoids code duplication
  • Avoids the needs to importScripts call, and the node polyfill for this.
  • Allows optimizers such as closure and JSDCE to operate on the combined code.
  • -sSINGLE_FILE now works with pthreads
  • Fewer network requests
  • No need for locateFile logic to run on the worker to find the worker.js

The test_pthread_custom_pthread_main_url test was completely removed
since it was testing how worker.js was located and worker.js no longer
exists.

Fixes: #9796

@sbc100 sbc100 requested review from RReverser, dschuff and kripken April 5, 2024 17:13
@sbc100 sbc100 force-pushed the pthread_inline branch 5 times, most recently from 50b906f to 5691c6f Compare April 5, 2024 18:47
@sbc100 sbc100 force-pushed the pthread_inline branch 5 times, most recently from 256a82f to 23f46b0 Compare April 6, 2024 02:05
@msqr1
Copy link

msqr1 commented Apr 6, 2024

Would you consider doing this for wasm workers and audio worklets? Also, if this has many advantages, I would suggest making this the default, while keeping option to NOT embed the wasm file, for reduced size, module caching and stuff.

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 6, 2024

Would you consider doing this for wasm workers and audio worklets? Also, if this has many advantages, I would suggest making this the default, while keeping option to NOT embed the wasm file, for reduced size, module caching and stuff.

Absolutely! One step a time though. Getting pthreads done first because it probably most used (and likely the hardest to get working) so a good proof of concept.

@msqr1
Copy link

msqr1 commented Apr 6, 2024

What about this being default and wasm file embedding? What do you think? We can make another setting called EMBED_WASM

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 6, 2024

What about this being default and wasm file embedding? What do you think? We can make another setting called EMBED_WASM

My plan is to make this not only the default, but the only option. I don't see any reason at this point to keep the old method around.

Don't we already have -sSINGLE_FILE to controle the Wasm embedding? I don't see that ever being the default BTW because it comes with several downsides, the most major of which it prevent streaming compilation and caching the wasm module.

@msqr1
Copy link

msqr1 commented Apr 6, 2024

SINGLE_FILE seems to mean embed everything into one file, not just the wasm, I think we should have another name for embedding or separate the wasm since this can be quite misleading

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 6, 2024

SINGLE_FILE seems to mean embed everything into one file, not just the wasm, I think we should have another name for embedding or separate the wasm since this can be quite misleading

But if we remove the separate worker files, what other file are there? Another way of putting it: What files would you not want to embed when you use -sSINGLE_FILE?

@msqr1
Copy link

msqr1 commented Apr 6, 2024

You're right, my mistake. But should we just rename the settings anyway? I'm really excited for this to land ngl

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 6, 2024

You're right, my mistake. But should we just rename the settings anyway? I'm really excited for this to land ngl

Doesn't SINGLE_FILE make sense? IIUC it is normally by for folks who don't want to worry about hosting multiple files, and want do distribute just one.. so that name seem to match the purpose. What about it do you think is confusing?

@msqr1
Copy link

msqr1 commented Apr 7, 2024

We should use a global switch statement for this? For simplicity I suggest just doing globalThis.constructor.name == "xxxGlobalScope", not using the ENVIRONMENT_IS_XXX

@curiousdannii
Copy link
Contributor

curiousdannii commented Apr 7, 2024

Rather than using a query string (which may complicate caching), can you use the name option?

Only potential issue I can see is that it's not supported in Android Webview, but I'm not sure if that's a supported browser.

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 7, 2024

Rather than using a query string (which may complicate caching), can you use the name option?

Sure we can try that. Can you explain what you mean by "may complicate caching"?

Only potential issue I can see is that it's not supported in Android Webview, but I'm not sure if that's a supported browser.

@curiousdannii
Copy link
Contributor

curiousdannii commented Apr 7, 2024

Just that many caches may distinguish query strings. It may not be added to service worker caches either. I was thinking of a case where you add all the files in a service worker for offline use, but then it fails because of the query string.

If someone was using a service worker they could work around that though. But it might be more predictable to avoid it possible. I don't know that the name property is a viable alternative however.

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 7, 2024

Just that many caches may distinguish query strings. It may not be added to service worker caches either. I was thinking of a case where you add all the files in a service worker for offline use, but then it fails because of the query string.

If someone was using a service worker they could work around that though. But it might be more predictable to avoid it possible. I don't know that the name property is a viable alternative however.

Are you saying that adding the query string would mean that the file would not cached at all, or that maybe we would end up with two seperate cache entries (one for the main.js and another for the main.js?phthread=1)? I like the idea of using name since we do control that.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@msqr1
Copy link

msqr1 commented Apr 16, 2024

In my opinion, if users upgraded for this feature, they would expect to update their deployment pipelines, too. Why are we complicating stuff up on our end to create a placeholder worker file?

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 16, 2024

In my opinion, if users upgraded for this feature, they would expect to update their deployment pipelines, too. Why are we complicating stuff up on our end to create a placeholder worker file?

The idea is that with this change user don't need to update their pipelines and they can continue to ship a dummy/unused worker.js file after updating and then they can update their pipelines at their leisure after that update.

Then in a future release we can remove the placeholder file.

That way we can we can tease apart any breakages that are due to the internals of this change from breakages that are due to the change in the number of output files.

Maybe its not worth it but @kripken and I thought it would be a good idea when we discussed it yesterday.

This method has many advantages over the previous method of generating
a separate file:

- Avoids separate output file simplifying deployment.
- Avoids confusing laying of global scopes
- Avoids exporting symbols on the Module simply for visibility within
  the worker file.
- Avoids code duplication
- Avoids the needs to importScripts call, and the node polyfill for
  this.
- Allows optimizers such as closure and JSDCE to operate on the combined
  code.
- `-sSINGLE_FILE` now works with pthreads
- Fewer network requests
- No need for locateFile logic to run on the worker to find the worker.js

The test_pthread_custom_pthread_main_url test was completely removed
since it was testing how worker.js was located and worker.js no longer
exists.

test_pthread_safe_stack depends on `onAbort` being proxied back to the
main thread.  In this case the `onAbort` handler is injected
conditionally in `--pre-js=browser_report.js`.  In the previous code
this meant that the proxied version took precedence because the pthread
handler override was injected first.

test_pthread_asan_use_after_free_2_wasmfs depends on `printErr` not
being proxied back to the main thread.  It is injected unconditionally
during `--pre-js`.  In the previous code path this means that
non-proxied version took precedence because it overrode the incoming
pthread handler.

Fixes: emscripten-core#9796
@msqr1
Copy link

msqr1 commented Apr 16, 2024

Backward compatibility I see. But we still eventually need to remove it to not confuse folks thinking SINGLE_FILE doing exactly what it says.

@sbc100
Copy link
Collaborator Author

sbc100 commented Apr 16, 2024

Backward compatibility I see. But we still eventually need to remove it to not confuse folks thinking SINGLE_FILE doing exactly what it says.

Yes, I'll probably just wait a month or so and then remove it.

@sbc100 sbc100 mentioned this pull request Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to inline webworker

7 participants