A Neovim plugin that detects Big5-encoded files and converts them to UTF-8 in place.
- Neovim 0.8.0 or later
- No external dependencies
Using lazy.nvim:
{
"your-username/Big5ToNeovim",
config = function()
require("big5").setup()
end,
}Using pckr.nvim:
use {
"your-username/Big5ToNeovim",
config = function()
require("big5").setup()
end,
}Manual installation: clone the repository and add the directory to your runtimepath, then call require("big5").setup() in your Neovim configuration.
require("big5").setup({
-- Enable automatic detection notification when a Big5 file is opened.
-- Default: false
auto_detect = false,
-- Prompt for confirmation when converting a file that contains byte sequences
-- that cannot be converted. Invalid bytes are replaced with a substitution
-- character whose exact value is platform-dependent: U+FFFD on Linux/macOS,
-- "?" on Windows.
-- Default: true
confirm_conversion = true,
})Reports whether the current file appears to be Big5-encoded. Reads raw bytes from disk (does not modify the file or buffer).
:Big5Check
Output examples:
File appears to be Big5-encoded (ratio: 95%, sequences: 248)File does not appear to be Big5-encoded.
Converts the current file from Big5 to UTF-8 in place. Overwrites the file on disk and reloads the buffer.
Warning: This operation is irreversible. The original Big5 file is overwritten with no backup. Back up any important files before running this command.
:Big5ToUtf8
Behavior:
- If the buffer has unsaved changes, prompts before proceeding.
- Checks if the file is Big5-encoded. If not, notifies and exits without changes.
- Converts the file content from Big5 to UTF-8 in memory.
- If the file contains invalid byte sequences (that cannot be converted), prompts for confirmation before writing.
- Writes the UTF-8 content to disk and reloads the buffer.
- Sets
fileencodingtoutf-8for the current buffer.
Detection uses a sample-based heuristic:
- Reads the first 8 KB of the file.
- If the sample is valid UTF-8, the file is classified as not-Big5.
- Scans for Big5 double-byte sequences (lead byte 0x81-0xFE followed by a valid trail byte 0x40-0x7E or 0xA1-0xFE).
- If at least 80% of candidate high-byte sequences are valid Big5 pairs, the file is classified as Big5.
The test suite uses busted via plenary.nvim.
First, generate the test fixtures:
lua test/fixtures/generate_fixtures.luaThen run the tests (ensure plenary.nvim is available):
nvim --headless -c "PlenaryBustedDirectory test/ {minimal_init = 'test/minimal_init.lua'}" -c "qa"Or set PLENARY_PATH if plenary is not in a standard location:
PLENARY_PATH=/path/to/plenary.nvim nvim --headless \
-c "PlenaryBustedDirectory test/ {minimal_init = 'test/minimal_init.lua'}" \
-c "qa"This plugin handles standard Big5 encoding only. The following are explicitly out of scope for v1:
- Big5-HKSCS (Hong Kong variant)
- Batch/directory conversion
- Other encodings (GB2312, GBK, Shift_JIS, etc.)
- UTF-8 to Big5 reverse conversion
- Backup file creation before conversion
MIT