fix(webapp): charset-aware decoding in the file viewer by camathieu · Pull Request #727 · root-gg/plik

camathieu · 2026-04-01T09:26:36Z

What

The inline file viewer was displaying garbled content for files encoded in non-UTF-8 charsets (UTF-16BE, UTF-16LE, ISO-8859-1, etc.). The viewer showed raw null-byte-separated characters instead of the actual text.

Why

Response.text() in the Fetch API always decodes the response body as UTF-8, regardless of what the server advertises in the Content-Type header. Since Plik's server correctly sets charset=utf-16be (or similar), the fix is to honour that header.

Changes

webapp/src/views/DownloadView.vue: Replaced resp.text() with arrayBuffer() + new TextDecoder(encoding) in viewFile(), using the charset from the Content-Type response header (falls back to utf-8 when absent)
webapp/src/utils.js: Added charsetFromContentType(contentType) — a pure helper that parses the charset= parameter out of a Content-Type header value
webapp/src/__tests__/utils.test.js: 11 new unit tests for charsetFromContentType covering UTF-16BE/LE, ISO-8859-1, Windows-1252, case-insensitivity, extra params, and null/empty inputs
webapp/ARCHITECTURE.md: New "Charset-Aware Text Decoding" section documenting the approach and the gotcha around UTF-16 BOM handling

Testing

246/246 unit tests passing (make test-frontend)
Frontend builds clean (make frontend)
Manual: upload a file with text/plain; charset=utf-16be content type — viewer now renders correctly instead of showing garbled output

Response.text() always decodes as UTF-8 per the Fetch spec, which garbles files whose Content-Type includes a non-UTF-8 charset (e.g. text/plain; charset=utf-16be). Fix: fetch the raw bytes via arrayBuffer() and decode with TextDecoder using the charset extracted from the Content-Type response header, falling back to utf-8 when absent. Extract the charset-parsing regex into a standalone charsetFromContentType() helper in utils.js so it can be unit-tested independently. Add 11 unit tests covering UTF-16BE, UTF-16LE, ISO-8859-1, Windows-1252, case-insensitivity, extra params, and null/empty inputs. Update webapp/ARCHITECTURE.md with a new "Charset-Aware Text Decoding" section.

camathieu merged commit 9ef913f into master Apr 1, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webapp): charset-aware decoding in the file viewer#727

fix(webapp): charset-aware decoding in the file viewer#727
camathieu merged 1 commit into
masterfrom
fix/utf16-file-viewer

camathieu commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

camathieu commented Apr 1, 2026

What

Why

Changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant