Skip to content

fread's nrows=0 errors if input is an empty table #2512

@franknarf1

Description

@franknarf1

I have several tables on disk. For each table, I want to get the column names to check them against expected names, using names(fread(fn, nrows=0)) as suggested in the ?fread documentation for nrows=. However, for any empty table, this gives an error:

# note that this example will write to your current directory
library(data.table)
DT0 = data.table(a = numeric(), b = numeric())
fn0 = "test0.csv"

fwrite(DT0, fn0)

fread(fn0) 
# works fine
fread(fn0, nrows=0)
# Error in fread(fn, nrows = 0) : 
#   Internal error in line 1848 of fread.c, please report on data.table GitHub:  allocnrow(1) < nrowLimit(0)

It works fine if the table is nonempty, though:

# note that this example will write to your current directory
library(data.table)
DT = data.table(a = numeric(1), b = numeric(1))
fn = "test.csv"

fwrite(DT, fn)

fread(fn, nrows=0) 
# works fine

Tested on...

data.table 1.10.5 IN DEVELOPMENT built 2017-12-08 20:14:33 UTC; travis

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

Verbose output ...

> fread(fn0, nrows=0, verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as boolean
[02] Opening the file
  Opening file test0.csv
  File opened, size = 5 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  No \n has been found in the data (the entire input was scanned) so \r-only line endings are allowed. This is unusual.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<a,b>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  sep=','  with 1 lines of 2 fields using quote rule 0
  Detected 2 columns on line 1. This line is either column names or first data row. Line starts as: <<a,b>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 2
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (3 bytes from row 1 to eof) / (2 * 3 jump0size) == 0
  Type codes (jump 000)    : AA  Quote rule 0
  'header' determined to be true because there are no number fields in the first and only row
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : 11
[10] Allocate memory for the datatable
  Allocating 2 column slots (2 - 0 dropped) with 1 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=0
Error in fread(fn0, nrows = 0, verbose = TRUE) : 
  Internal error in line 1848 of fread.c, please report on data.table GitHub:  allocnrow(1) < nrowLimit(0)

(For now, I'll fiddle with readLines and strsplit since I think I'll only be facing csvs and won't have to deal with quotes... I mean delimitation of columns in x="'a,b', AB\n1,2" is a lot better handled by names(fread(x, quote="\'", nrow=0)) than I could hack together like strsplit(readLines(textConnection(x), n=1), ", *")[[1]].)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions