Skip to content

fwrite encoding problem with file names #3078

@dpprdan

Description

@dpprdan

fwrite() cannot handle umlauts (and presumably all non-ASCII chars) in file names and paths on Windows (here with LC_COLLATE=German_Germany.1252 but from my experience this will also be a problem in other non-UTF-8 locales).

When the umlaut is in the file name, fwrite writes the file, but with a faulty file name.

library(data.table)
setwd(tempdir())
DF = data.frame(A=1:3, B=c("foo","A,Name","baz"))
fwrite(DF, "töst.csv")
list.files(pattern = "\\.csv")
#> [1] "töst.csv"

When the umlaut is in the path, fwrite cannot write the file at all.

dir.create("ä")
data.table::fwrite(DF, "ä/test.csv")
#> Error in data.table::fwrite(DF, "ä/test.csv"): No such file or directory: 'ä/test.csv'. Unable to create new file for writing (it does not exist already). Do you have permission to write here, is there space on the disk and does the path exist?

I looked at and debug-ed the R code and it seems to me that up until line 67 the file argument is encoded as “UTF-8” (as it should IMO) and looks fine. So my guess would be that the file path’s encoding goes wrong in the CfwriteR code.

From looking at the characters that should be “ö” or “ä” respectively, the problem seems to be that CfwriteR get’s a UTF-8 string but handles it as if it were encoded as latin-1, see this table.

If the error were in the R code, I would solve it with a file <- Encoding("UTF-8") line, but I do not know how this is done in C.

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language en                          
#>  collate  German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2018-09-27
#> Packages -----------------------------------------------------------------
#>  package    * version date       source                            
#>  base       * 3.5.1   2018-07-02 local                             
#>  compiler     3.5.1   2018-07-02 local                             
#>  data.table * 1.11.6  2018-09-19 CRAN (R 3.5.1)                    
#>  datasets   * 3.5.1   2018-07-02 local                             
#>  devtools     1.13.6  2018-06-27 CRAN (R 3.5.1)                    
#>  digest       0.6.17  2018-09-12 CRAN (R 3.5.1)                    
#>  evaluate     0.11    2018-07-17 CRAN (R 3.5.1)                    
#>  graphics   * 3.5.1   2018-07-02 local                             
#>  grDevices  * 3.5.1   2018-07-02 local                             
#>  htmldeps     0.1.1   2018-07-30 Github (rstudio/htmldeps@c1023e0) 
#>  htmltools    0.3.6   2017-04-28 CRAN (R 3.5.1)                    
#>  knitr        1.20    2018-02-20 CRAN (R 3.5.1)                    
#>  magrittr     1.5     2014-11-22 CRAN (R 3.5.1)                    
#>  memoise      1.1.0   2017-04-21 CRAN (R 3.5.1)                    
#>  methods    * 3.5.1   2018-07-02 local                             
#>  Rcpp         0.12.18 2018-07-23 CRAN (R 3.5.1)                    
#>  rmarkdown    1.10.13 2018-09-04 Github (rstudio/rmarkdown@19008bf)
#>  stats      * 3.5.1   2018-07-02 local                             
#>  stringi      1.2.4   2018-07-20 CRAN (R 3.5.1)                    
#>  stringr      1.3.1   2018-05-10 CRAN (R 3.5.1)                    
#>  tools        3.5.1   2018-07-02 local                             
#>  utils      * 3.5.1   2018-07-02 local                             
#>  withr        2.1.2   2018-03-15 CRAN (R 3.5.1)                    
#>  yaml         2.2.0   2018-07-25 CRAN (R 3.5.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions