Skip to content

fread fill=true and sep= provided could still read as 1-column #2666

@mrdwab

Description

@mrdwab

Here's a minimal example of some small files that I was trying to read with fread:

library(data.table)

V1 <- c("A;B;C", "D", "E;F")
V2 <- c("A;B;C", "D", "E")

fread(paste(V1, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:  E  F   

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#       V1
# 1: A;B;C
# 2:     D
# 3:     E

Notice that the second file only returns 1 column while I was expecting 3. It seems to be ignoring the sep provided and guessing based on the remaining rows. Here are some other "files", one of which also does not work, and one that does:

V3 <- c("A;B;C", ";D", "E")
V4 <- c("A;B;C", "D", ";E")

fread(paste(V3, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#       V1
# 1: A;B;C
# 2:    ;D
# 3:     E

fread(paste(V4, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:     E   

I tried specifying colClasses:

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, colClasses = list(character = 1:3))
# Error in fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE,  : 
#   Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]

And then tried setting skip = 0 (which works):

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:  E      

However, I don't want to set skip = 0 because then it doesn't seem to work if a sep value is not found in the first row:

V5 <- c("D", "E", "A;B;C")

fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
#       V1
# 1:     D
# 2:     E
# 3: A;B;C

fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  D      
# 2:  E      
# 3:  A  B  C

Two questions:

  1. Should fread be ignoring a manually specified sep value?
  2. The documentation says that skip defaults to 0, but formals(fread)$skip returns [1] "__auto__". Should the documentation be updated to explain what "__auto__" represents?

sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 17.10
# 
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#  [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8       
#  [4] LC_COLLATE=en_IN.UTF-8     LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8   
#  [7] LC_PAPER=en_IN.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] data.table_1.10.5
# 
# loaded via a namespace (and not attached):
# [1] compiler_3.4.2 tools_3.4.2    yaml_2.1.17 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions