Skip to content

fread from 1.11: self healing regression for fill = T when unmatched quote occurs #2859

@christianhomberg

Description

@christianhomberg

For the attached file, the following code will not throw a warning in data.table 1.11 whereas we get an expected warning with 1.10.4. Background: the file contains an unmatched quote in line 20000:

dt_2859 = data.table::fread("dt_2859.csv", fill = T)
This results in warning with following sessionInfo():
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.4      magrittr_1.5        tools_3.4.4        
[4] yaml_2.1.16         data.table_1.10.4-3 rlang_0.2.0.9001   
[7] purrr_0.2.4    
whereas `fread` stays quiet with following sessionInfo():
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markovchain_0.6.9.8-1 shiny_1.0.5           data.table_1.11.2    
[4] magrittr_1.5          ggplot2_2.2.1        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16        compiler_3.4.4      pillar_1.2.2        later_0.7.2        
 [5] plyr_1.8.4          tools_3.4.4         digest_0.6.15       packrat_0.4.9-2    
 [9] jsonlite_1.5        evaluate_0.10.1     tibble_1.4.2        gtable_0.2.0       
[13] lattice_0.20-35     pkgconfig_2.0.1     rlang_0.2.0.9001    Matrix_1.2-14      
[17] igraph_1.2.1        parallel_3.4.4      yaml_2.1.19         expm_0.999-2       
[21] stringr_1.3.0       knitr_1.20          stats4_3.4.4        rprojroot_1.3-2    
[25] grid_3.4.4          flexdashboard_0.5.1 R6_2.2.2            rmarkdown_1.9      
[29] matlab_1.0.2        purrr_0.2.4         codetools_0.2-15    backports_1.1.2    
[33] scales_0.5.0        promises_1.0.1      htmltools_0.3.6     mime_0.5           
[37] colorspace_1.3-2    xtable_1.8-2        httpuv_1.4.2        stringi_1.2.2      
[41] RcppParallel_4.4.0  lazyeval_0.2.1      munsell_0.4.3

In version 1.11 only the first 20000 rows are read and at least for me rstudio crashes when trying to print the resulting dt_2859. With version 1.10.4 however we get a warning and know to set quote = "".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions