fread() fails with trailing delimiter

I am working with CSV data files generated by an instrument.  The instrument exports three tables, each with two header rows, into the same file.  All lines end with a ',EOL'.  

```
For Example:
   Head 1,   Head 2,   Head 3,   ...,   Head n,
    Sub 1,    Sub 2,    Sub 3,   ...,    Sub n,
character,  numeric,  numeric,   ...,  numeric,
character,  numeric,  numeric,   ...,  numeric,
character,  numeric,  numeric,   ...,  numeric,
< 30,000 - 800,000 rows >
```

Using `read.table()` with a combination of `skip` and `nrow` arguments to specify a continuous block of data corresponding to one of the three tables in the file works fine:

```
df1 <- read.csv(file, header=FALSE, skip=20, nrows=27214)
dim(df1)
[1] 27214    43
```

Using `fread()` with the same settings returns an error:

```
df <- fread(file, header=FALSE, skip=20, nrows=27214)

Error in fread(file, header = FALSE, skip = 20, nrows = 27214) : 
Expected sep (',') but new line, EOF (or other non printing character) ends 
field 37 on line 22 when detecting types:  P - 20,        ,3.897,133.436,
0.786,1.137,0.046,761.305,0.211,183.300,1.129,1337.282,0.563,385.954,
116117.274,50391.888,166509.163,2.814,2.799,396.083,0.317,4775.659,0.285,
12.336,1288.281,0.867,1.066,0.721,0.377,272.761,997.594,2668.682,1060838.391,
424835.353,1485673.719,10.000,
```

There actually exists two problems here.
1. `read.table()` is able to interpret a comma immediately preceding an End Of Line as a column with value NA.  Ideally, `fread()` should mimic this behavior, and/or provide an option to remove columns with `unique(column) == NA`. -- Removing the trailing commas from all lines in my file, allows `fread()` to execute successfully.  
2. Perhaps more importantly, the line listed in the error message above is the 5th from the end of the file.  When `nrows` is specified, type detection should be constrained to the lines between `skip` and `skip + nrows`.  That is, perform type detection in rows `c(1:5) + skip`, the middle 5 rows, and `skip + nrows - c( 5:1 )`.  

In my particular case, the number of columns is not fixed between the three data tables in the file, so using rows outside the `skip`-`nrows` range will not give an accurate representation of the data with the range.

Verbose output from failed `fread()` with trailing commas:

```
fread(file, sep=",", header=FALSE, skip=cellHead, nrows=length(cell), verbose=T)

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.012 GB. 
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Looking for supplied sep ',' on line 21 ('skip' has been supplied) ... found ok
Found 43 columns
First row with 43 fields occurs on line 21 (either column names or first row of data)
'header' changed by user from 'auto' to FALSE
Count of eol after first data row: 27993
Subtracted 2 for last eol and any trailing empty lines, leaving 27991 data rows
nrow limited to nrows passed in (27214)
Type codes (   first 5 rows): 4144444444444444444444444444444444444444410
Type codes (+ middle 5 rows): 4144444444444444444444444444444444444444410
Error in fread(file, sep = ",", header = FALSE, skip = cellHead, nrows = length(cell),  : 
  Expected sep (',') but new line, EOF (or other non printing character) ends field 37 on line 22 when detecting types: P - 20,        ,3.897,133.436,0.786,1.137,0.046,761.305,0.211,183.300,1.129,1337.282,0.563,385.954,116117.274,50391.888,166509.163,2.814,2.799,396.083,0.317,4775.659,0.285,12.336,1288.281,0.867,1.066,0.721,0.377,272.761,997.594,2668.682,1060838.391,424835.353,1485673.719,10.000,
```

Verbose output from successful `fread()` without trailing commas:

```
fread(file2, sep=",", header=FALSE, skip=cellHead, nrows=length(cell), verbose=T)

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.008 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Looking for supplied sep ',' on line 21 ('skip' has been supplied) ... found ok
Found 42 columns
First row with 42 fields occurs on line 21 (either column names or first row of data)
'header' changed by user from 'auto' to FALSE
Count of eol after first data row: 27992
Subtracted 1 for last eol and any trailing empty lines, leaving 27991 data rows
nrow limited to nrows passed in (27214)
Type codes (   first 5 rows): 413333333333333333313333333333333333333311
Type codes (+ middle 5 rows): 413333333333333333313333333333333333333311
Type codes (+   last 5 rows): 413333333333333333333333333333333333333311
Type codes: 413333333333333333333333333333333333333311 (after applying colClasses and integer64)
Type codes: 413333333333333333333333333333333333333311 (after applying drop or select (if supplied)
Allocating 42 column slots (42 - 0 dropped)
Bumping column 15 from REAL to STR on data row 2228, field contains '      NaN '
   0.002s (  0%) Memory map (rerun may be quicker)
   0.003s (  0%) sep and header detection
   0.227s ( 30%) Count rows (wc -l)
   0.002s (  0%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 27214x42 result (xMB) in RAM
   0.519s ( 69%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.002s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.755s        Total
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fread() fails with trailing delimiter #831

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fread() fails with trailing delimiter #831

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions