On Windows, when the text is UTF-8 encoded and the printed message from fread() contains some text, the message will be displayed as garbage letters. The cause I believe is we didn't mark the txt as the declared encoding "UTF-8".
A reproducible example on Windows
Code
txt <- "A,B\n中文1,中文2\n中文3"
txt <- enc2utf8(txt)
data.table::fread(text = txt, encoding = 'UTF-8')
Output
A B
1: 中文1 中文2
Warning message:
In data.table::fread(text = txt, encoding = "UTF-8") :
Discarded single-line footer: <<涓枃3>>
In contrast to native encoded txt which looks correct
Code
txt <- "A,B\n中文1,中文2\n中文3"
data.table::fread(text = txt)
Output
A B
1: 中文1 中文2
Warning message:
In data.table::fread(text = txt) : Discarded single-line footer: <<中文3>>
session Info
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_1.5 data.table_1.13.0
Another example on Mac
x <- "fa\xE7ile"
Encoding(x) <- "latin1"
txt <- sprintf("A,B\n%s,%s\n%s", x, x, x)
Encoding(txt) <- "UTF-8"
data.table::fread(text = txt, encoding = 'UTF-8')
txt2 <- iconv(txt, "UTF-8", "latin1")
data.table::fread(text = txt2, encoding = 'Latin-1')
On Windows, when the text is UTF-8 encoded and the printed message from
fread()contains some text, the message will be displayed as garbage letters. The cause I believe is we didn't mark the txt as the declared encoding "UTF-8".A reproducible example on Windows
Code
Output
In contrast to native encoded txt which looks correct
Code
Output
session Info
Another example on Mac