I have several tables on disk. For each table, I want to get the column names to check them against expected names, using names(fread(fn, nrows=0)) as suggested in the ?fread documentation for nrows=. However, for any empty table, this gives an error:
# note that this example will write to your current directory
library(data.table)
DT0 = data.table(a = numeric(), b = numeric())
fn0 = "test0.csv"
fwrite(DT0, fn0)
fread(fn0)
# works fine
fread(fn0, nrows=0)
# Error in fread(fn, nrows = 0) :
# Internal error in line 1848 of fread.c, please report on data.table GitHub: allocnrow(1) < nrowLimit(0)
It works fine if the table is nonempty, though:
# note that this example will write to your current directory
library(data.table)
DT = data.table(a = numeric(1), b = numeric(1))
fn = "test.csv"
fwrite(DT, fn)
fread(fn, nrows=0)
# works fine
Tested on...
data.table 1.10.5 IN DEVELOPMENT built 2017-12-08 20:14:33 UTC; travis
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.5
Verbose output ...
> fread(fn0, nrows=0, verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 8 threads (omp_get_max_threads()=8, nth=8)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as boolean
[02] Opening the file
Opening file test0.csv
File opened, size = 5 bytes.
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
No \n has been found in the data (the entire input was scanned) so \r-only line endings are allowed. This is unusual.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<a,b>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep ...
sep=',' with 1 lines of 2 fields using quote rule 0
Detected 2 columns on line 1. This line is either column names or first data row. Line starts as: <<a,b>>
Quote rule picked = 0
fill=false and the most number of columns found is 2
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 1 because (3 bytes from row 1 to eof) / (2 * 3 jump0size) == 0
Type codes (jump 000) : AA Quote rule 0
'header' determined to be true because there are no number fields in the first and only row
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 11
[10] Allocate memory for the datatable
Allocating 2 column slots (2 - 0 dropped) with 1 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=0
Error in fread(fn0, nrows = 0, verbose = TRUE) :
Internal error in line 1848 of fread.c, please report on data.table GitHub: allocnrow(1) < nrowLimit(0)
(For now, I'll fiddle with readLines and strsplit since I think I'll only be facing csvs and won't have to deal with quotes... I mean delimitation of columns in x="'a,b', AB\n1,2" is a lot better handled by names(fread(x, quote="\'", nrow=0)) than I could hack together like strsplit(readLines(textConnection(x), n=1), ", *")[[1]].)
I have several tables on disk. For each table, I want to get the column names to check them against expected names, using
names(fread(fn, nrows=0))as suggested in the?freaddocumentation fornrows=. However, for any empty table, this gives an error:It works fine if the table is nonempty, though:
Tested on...
Verbose output ...
(For now, I'll fiddle with readLines and strsplit since I think I'll only be facing csvs and won't have to deal with quotes... I mean delimitation of columns in
x="'a,b', AB\n1,2"is a lot better handled bynames(fread(x, quote="\'", nrow=0))than I could hack together likestrsplit(readLines(textConnection(x), n=1), ", *")[[1]].)