Skip to content

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Jul 14, 2025

Setting this to zero doesn't provide much value to anyone since that conditional usage
is relatively cheap (compared to the fallback) and the API available in all browsers these days.

We still need the fallback for things like audio worklets and JS shell environments.

@sbc100 sbc100 requested review from RReverser and kripken July 14, 2025 21:35
@sbc100 sbc100 force-pushed the TEXTDECODER_zero branch from 75c8efd to fe847ae Compare July 14, 2025 21:37
Copy link
Collaborator Author

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codesize changes here are because of the removal of the comments in the output (they only effected tests were the unoptimized output tests, that include comments).

@sbc100 sbc100 force-pushed the TEXTDECODER_zero branch from fe847ae to 7685ad0 Compare July 14, 2025 22:51
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this, I wonder if we can enable mode 2 when ENVIRONMENT / the browser versions imply that?

@sbc100
Copy link
Collaborator Author

sbc100 commented Jul 14, 2025

After this, I wonder if we can enable mode 2 when ENVIRONMENT / the browser versions imply that?

Yes, I was thinking that too. Or at least we could default to it. I do wonder about the short-string optimization where we currently opt out of using the TextDecoder for short strings.. i wonder if that really is an optimization still on modern browsers.

@sbc100 sbc100 force-pushed the TEXTDECODER_zero branch from 7685ad0 to 3cf19b4 Compare July 14, 2025 23:53
@sbc100 sbc100 merged commit b490c44 into emscripten-core:main Jul 14, 2025
4 of 13 checks passed
@sbc100 sbc100 deleted the TEXTDECODER_zero branch July 14, 2025 23:53
@arsnyder16
Copy link
Contributor

@sbc100 @kripken Just want to point out that TextDecoder=0 was being used to work around some bugs that haven't been really understood fully. Maybe those bugs have disappeared maybe they still exists but the workaround no longer exists

#18034 (comment)

#16994 (comment)

@sbc100
Copy link
Collaborator Author

sbc100 commented Jul 15, 2025

@sbc100 @kripken Just want to point out that TextDecoder=0 was being used to work around some bugs that haven't been really understood fully. Maybe those bugs have disappeared maybe they still exists but the workaround no longer exists

#18034 (comment)

#16994 (comment)

Oh wow, I didn't remember those. It looks like #16994 can now be closed. I'm not sure about #18034 .. that looks very odd, and is hopefully fixed.

@arsnyder16
Copy link
Contributor

I believe #18034 is likely fixed indirectly from #22844, Thanks! I am good going with this and can remove TEXT_DECODE=0 workaround once i update to 4.0.12+

@juj
Copy link
Collaborator

juj commented Aug 31, 2025

I believe this causes a regression for browsers that are unable to use TextDecoder.decode() on SharedArrayBuffers, because Emscripten is unable to detect that (unless it did a test decode and caught an exception).

@juj
Copy link
Collaborator

juj commented Aug 31, 2025

Setting this to zero doesn't provide much value to anyone since that conditional usage
is relatively cheap (compared to the fallback) and the API available in all browsers these days.

Users that want to ensure that their code works on earlier browser versions could set TEXTDECODER=0 and be sure that they wouldn't hit browser bugs.

I am a bit surprised what the benefit was for removing TEXTDECODER=0 support.

@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 2, 2025

Setting this to zero doesn't provide much value to anyone since that conditional usage
is relatively cheap (compared to the fallback) and the API available in all browsers these days.

Users that want to ensure that their code works on earlier browser versions could set TEXTDECODER=0 and be sure that they wouldn't hit browser bugs.

Do such bugs exist?

Couldn't such as user just do TextDecoder = undefined in a pre-js file?

I am a bit surprised what the benefit was for removing TEXTDECODER=0 support.

Its quite a bit of complexity being removed, and in paritucalar #if/#else conditions which have a pretty high complexity cost in emscripten as they make the code hard to reason about (IMHO).

IIRC it also lays the groundwork for followup simplifications, although I don't remember now what they were.

@juj
Copy link
Collaborator

juj commented Sep 2, 2025

Do such bugs exist?

Yes, there is a very long range of browsers that after adding support for SharedArrayBuffer, didn't yet support TextDecoder on SAB views. E.g. maybe two to three years of Firefoxes at least, probably same duration of Chromes.

Couldn't such as user just do TextDecoder = undefined in a pre-js file?

The user will have to discover that the bug exists in the old browsers first. And then that would mean that any other uses of TextDecoder site wide code in the same JS context would not get to access TextDecoder either, even on non-Emscripten content.

Though maybe it won't become a problem to users. For us at least this won't be an issue, we are eyeing towards WebGPU only browsers for the future.

@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 2, 2025

Do such bugs exist?

Yes, there is a very long range of browsers that after adding support for SharedArrayBuffer, didn't yet support TextDecoder on SAB views. E.g. maybe two to three years of Firefoxes at least, probably same duration of Chromes.

I thought TextDecode was simply/officially not supported with SAB, and we have code for dealing with that, don't we?

e.g.

return UTF8Decoder.decode(heapOrArray.buffer ? {{{ getUnsharedTextDecoderView('heapOrArray', 'idx', 'endPtr') }}} : new Uint8Array(heapOrArray.slice(idx, endPtr)))

Are there browsers that do support this and wouldn't need the call to getUnsharedTextDecoderView here?

Couldn't such as user just do TextDecoder = undefined in a pre-js file?

The user will have to discover that the bug exists in the old browsers first. And then that would mean that any other uses of TextDecoder site wide code in the same JS context would not get to access TextDecoder either, even on non-Emscripten content.

True, asking users to do TextDecoder = undefined is probably not reasonable.

Better for us to handle any TextDecoder bugs if we find any. If there are TextDecoder bugs that we don't handle, I'm happy to add workarounds for them.

Though maybe it won't become a problem to users. For us at least this won't be an issue, we are eyeing towards WebGPU only browsers for the future.

@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 2, 2025

(Like the workaround we already have for the SAB limitation)

@juj
Copy link
Collaborator

juj commented Sep 2, 2025

and we have code for dealing with that, don't we?

Ohhh. I didn't recall that at all. Then nevermind, nothing to see here. (Performance might be appalling with {{{ getUnsharedTextDecoderView('heapOrArray', 'idx', 'endPtr') }}} construct though)

@RReverser
Copy link
Collaborator

I thought TextDecode was simply/officially not supported with SAB, and we have code for dealing with that, don't we?

FWIW that's not entirely correct. The spec has allow TextDecoder to work with SABs for a while now, it's just that none of the browsers bother to prioritise and implement it, so we are stuck with the slow clone workaround.

I wonder if Emscripten could nudge someone to help it happen at Chrome.

Better for us to handle any TextDecoder bugs if we find any. If there are TextDecoder bugs that we don't handle, I'm happy to add workarounds for them.

I saw this bug in wasm-bindgen today and thought of this thread: wasm-bindgen/wasm-bindgen#4471

I suspect Emscripten will run into the same issue after decoding 2GB worth of strings, as we also use a shared TextDecoder instance (like everyone else). Might be worth testing and adding a workaround.

@juj
Copy link
Collaborator

juj commented Sep 4, 2025

Uhh, what a bug.

I think it would be best to restore TEXTDECODER=0 then.

@RReverser
Copy link
Collaborator

RReverser commented Sep 4, 2025

I think it's worth adding a workaround anyway, counting how much we've decoded (or catching the error) is not that hard.

@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 4, 2025

I thought TextDecode was simply/officially not supported with SAB, and we have code for dealing with that, don't we?

FWIW that's not entirely correct. The spec has allow TextDecoder to work with SABs for a while now, it's just that none of the browsers bother to prioritise and implement it, so we are stuck with the slow clone workaround.

Oh, so one day, once browsers start implementing it we can probe for support for this? Is there some way to track this feature or know when one of the browsers implements it?

I wonder if Emscripten could nudge someone to help it happen at Chrome.

Better for us to handle any TextDecoder bugs if we find any. If there are TextDecoder bugs that we don't handle, I'm happy to add workarounds for them.

I saw this bug in wasm-bindgen today and thought of this thread: wasm-bindgen/wasm-bindgen#4471

I suspect Emscripten will run into the same issue after decoding 2GB worth of strings, as we also use a shared TextDecoder instance (like everyone else). Might be worth testing and adding a workaround.

@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 4, 2025

I guess we need to write a test the decoded 3gb of data now. Great. Thanks Safari.

@RReverser
Copy link
Collaborator

Oh, so one day, once browsers start implementing it we can probe for support for this? Is there some way to track this feature or know when one of the browsers implements it?

Yup you can see AllowSharedBufferSource in the spec: https://encoding.spec.whatwg.org/#textdecoder

For individual bugs, you'd have to track them down in corresponding browser bug trackers, but e.g. it's already implemented in Node.js: nodejs/node#32203

I guess we need to write a test the decoded 3gb of data now. Great. Thanks Safari.

IIUC, it's also enough to feed same small chunk data in a loop until you reach total 3gb, you don't necessarily need to allocate that much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants