fix(webapp): charset-aware decoding in the file viewer#727
Merged
Conversation
Response.text() always decodes as UTF-8 per the Fetch spec, which garbles files whose Content-Type includes a non-UTF-8 charset (e.g. text/plain; charset=utf-16be). Fix: fetch the raw bytes via arrayBuffer() and decode with TextDecoder using the charset extracted from the Content-Type response header, falling back to utf-8 when absent. Extract the charset-parsing regex into a standalone charsetFromContentType() helper in utils.js so it can be unit-tested independently. Add 11 unit tests covering UTF-16BE, UTF-16LE, ISO-8859-1, Windows-1252, case-insensitivity, extra params, and null/empty inputs. Update webapp/ARCHITECTURE.md with a new "Charset-Aware Text Decoding" section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The inline file viewer was displaying garbled content for files encoded in non-UTF-8 charsets (UTF-16BE, UTF-16LE, ISO-8859-1, etc.). The viewer showed raw null-byte-separated characters instead of the actual text.
Why
Response.text()in the Fetch API always decodes the response body as UTF-8, regardless of what the server advertises in theContent-Typeheader. Since Plik's server correctly setscharset=utf-16be(or similar), the fix is to honour that header.Changes
webapp/src/views/DownloadView.vue: Replacedresp.text()witharrayBuffer()+new TextDecoder(encoding)inviewFile(), using the charset from theContent-Typeresponse header (falls back toutf-8when absent)webapp/src/utils.js: AddedcharsetFromContentType(contentType)— a pure helper that parses thecharset=parameter out of a Content-Type header valuewebapp/src/__tests__/utils.test.js: 11 new unit tests forcharsetFromContentTypecovering UTF-16BE/LE, ISO-8859-1, Windows-1252, case-insensitivity, extra params, and null/empty inputswebapp/ARCHITECTURE.md: New "Charset-Aware Text Decoding" section documenting the approach and the gotcha around UTF-16 BOM handlingTesting
make test-frontend)make frontend)text/plain; charset=utf-16becontent type — viewer now renders correctly instead of showing garbled output