Skip to content

fix(webapp): charset-aware decoding in the file viewer#727

Merged
camathieu merged 1 commit into
masterfrom
fix/utf16-file-viewer
Apr 1, 2026
Merged

fix(webapp): charset-aware decoding in the file viewer#727
camathieu merged 1 commit into
masterfrom
fix/utf16-file-viewer

Conversation

@camathieu
Copy link
Copy Markdown
Member

What

The inline file viewer was displaying garbled content for files encoded in non-UTF-8 charsets (UTF-16BE, UTF-16LE, ISO-8859-1, etc.). The viewer showed raw null-byte-separated characters instead of the actual text.

Why

Response.text() in the Fetch API always decodes the response body as UTF-8, regardless of what the server advertises in the Content-Type header. Since Plik's server correctly sets charset=utf-16be (or similar), the fix is to honour that header.

Changes

  • webapp/src/views/DownloadView.vue: Replaced resp.text() with arrayBuffer() + new TextDecoder(encoding) in viewFile(), using the charset from the Content-Type response header (falls back to utf-8 when absent)
  • webapp/src/utils.js: Added charsetFromContentType(contentType) — a pure helper that parses the charset= parameter out of a Content-Type header value
  • webapp/src/__tests__/utils.test.js: 11 new unit tests for charsetFromContentType covering UTF-16BE/LE, ISO-8859-1, Windows-1252, case-insensitivity, extra params, and null/empty inputs
  • webapp/ARCHITECTURE.md: New "Charset-Aware Text Decoding" section documenting the approach and the gotcha around UTF-16 BOM handling

Testing

  • 246/246 unit tests passing (make test-frontend)
  • Frontend builds clean (make frontend)
  • Manual: upload a file with text/plain; charset=utf-16be content type — viewer now renders correctly instead of showing garbled output

Response.text() always decodes as UTF-8 per the Fetch spec, which
garbles files whose Content-Type includes a non-UTF-8 charset
(e.g. text/plain; charset=utf-16be).

Fix: fetch the raw bytes via arrayBuffer() and decode with
TextDecoder using the charset extracted from the Content-Type
response header, falling back to utf-8 when absent.

Extract the charset-parsing regex into a standalone
charsetFromContentType() helper in utils.js so it can be unit-tested
independently. Add 11 unit tests covering UTF-16BE, UTF-16LE,
ISO-8859-1, Windows-1252, case-insensitivity, extra params, and
null/empty inputs. Update webapp/ARCHITECTURE.md with a new
"Charset-Aware Text Decoding" section.
@camathieu camathieu merged commit 9ef913f into master Apr 1, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant