[Request] Please include "\n" as additional default separator for parameter sep in fread() for improved backwards-compatibility

Hi,

Currently, the parameter sep in function fread defaults to  the set [,\t |;:] 
I suggest to include "\n" as final separator in the default, as this might improve downwards-compatibility of existing code with previous versions of data.table.
An example would be a file, where only one single string is written in each line but occassionally some of the sep-default-characters are part of the string.This produces an error in 1.9.5 due to string "c:4" in line 3 (but not in 1.9.4) when not explicitly specifying sep = "\n". 

Here is an example:
(I am using data.table 1.9.5 devel from 8.3.2015, txt file available at https://www.dropbox.com/s/y6cmkcza36c1qjn/ex_150309.txt?dl=0)

```
myfile = "/net/ifs1/san_projekte/projekte/genstat/09_nutzer/holger/39_dt_request//ex_150309.txt" # available at https://www.dropbox.com/s/y6cmkcza36c1qjn/ex_150309.txt?dl=0


aa = fread(myfile, verbose = T)

## Input contains no \n. Taking this to be a filename to open
## File opened, filesize is 0.000000 GB.
## Memory mapping ... ok
## Detected eol as \r\n (CRLF) in that order, the Windows standard.
## Positioned on line 1 after skip or autostart
## This line is the autostart and not blank so searching up for the last non-blank ... line 1
## Detecting sep ... ':'
## Detected 2 columns. Longest stretch was from line 3 to line 3
## Starting data input on line 3 (either column names or first row of data). First 10 characters: c:4

## Warning in fread(myfile, verbose = T): Starting data input on line 3 and
## discarded previous non-empty line: b

## Some fields on line 3 are not type character (or are empty). Treating as a data row and using default column names.
## Count of eol: 3 (including 1 at the end)
## Count of sep: 1
## nrow = MIN( nsep [1] / ncol [2] -1, neol [3] - nblank [1] ) = 1

## Error in fread(myfile, verbose = T): Expected sep (':') but new line, EOF (or other non printing character) ends field 0 when detecting types (   first): d

aa = fread(myfile, verbose = T, sep = "\n")

## Input contains no \n. Taking this to be a filename to open
## File opened, filesize is 0.000000 GB.
## Memory mapping ... ok
## Detected eol as \r\n (CRLF) in that order, the Windows standard.
## Positioned on line 1 after skip or autostart
## This line is the autostart and not blank so searching up for the last non-blank ... line 1
## Using supplied sep '
## ' ... Deducing this is a single column input.
## Starting data input on line 1 (either column names or first row of data). First 10 characters: a
## All the fields on line 1 are character fields. Treating as the column names.
## Count of eol: 4 (including 1 at the end)
## Count of sep: 3
## ncol==1 so sep count ignored
## Type codes (   first 5 rows): 4
## Type codes: 4 (after applying colClasses and integer64)
## Type codes: 4 (after applying drop or select (if supplied)
## Allocating 1 column slots (1 - 0 dropped)
## Read 3 rows. Exactly what was estimated and allocated up front
##    0.000s ( 71%) Memory map (rerun may be quicker)
##    0.000s ( 13%) sep and header detection
##    0.000s (  3%) Count rows (wc -l)
##    0.000s (  6%) Column type detection (first, middle and last 5 rows)
##    0.000s (  3%) Allocation of 3x1 result (xMB) in RAM
##    0.000s (  2%) Reading data
##    0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
##    0.000s (  0%) Coercing data already read in type bumps (if any)
##    0.000s (  2%) Changing na.strings to NA
##    0.000s        Total

aa

##      a
## 1:   b
## 2: c:4
## 3:   d

sessionInfo()

## R version 3.1.2 (2014-10-31)
## Platform: x86_64-suse-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.9.5 knitr_1.9       
## 
## loaded via a namespace (and not attached):
## [1] chron_2.3-45   evaluate_0.5.5 formatR_1.0    stringr_0.6.2 
## [5] tools_3.1.2
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] Please include "\n" as additional default separator for parameter sep in fread() for improved backwards-compatibility #1073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Request] Please include "\n" as additional default separator for parameter sep in fread() for improved backwards-compatibility #1073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions