Skip to content

fread internal error for nrow=0 on malformed csv #4355

@karldw

Description

@karldw

# Minimal reproducible example

In the example below, there's an extra comma after the b, which causes an internal error with nrow=0. I'm not sure this bug is worth fixing – fread can't ever handle every possible malformation in a CSV – but I wanted to register it.

library(data.table)
fread("a,b,\n1,2", nrow=0)
#> Error in fread("a,b,\n1,2", nrow = 0) : 
#>   Internal error: sampleLines(1) > allocnrow(0)

The behavior I expected was for fread to provide a warning, as happens without the nrow argument

fread("a,b,\n1,2")
#> Warning in fread("a,b,\n1,2") :
#>   Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
#>    a b V3
#> 1: 1 2 NA

The behavior when there's a third data column is also interesting:

# Same error with nrow=0
fread("a,b,\n1,2,1",nrow=0)
#> Error in fread("a,b,\n1,2,1", nrow = 0) : 
#>  Internal error: sampleLines(1) > allocnrow(0)

# No trouble otherwise
fread("a,b,\n1,2,1")
#>    a b V3
#> 1: 1 2  1

# Output of sessionInfo()

sessionInfo()
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.10

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions