Here's a minimal example of some small files that I was trying to read with fread:
library(data.table)
V1 <- c("A;B;C", "D", "E;F")
V2 <- c("A;B;C", "D", "E")
fread(paste(V1, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
# V1 V2 V3
# 1: A B C
# 2: D
# 3: E F
fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
# V1
# 1: A;B;C
# 2: D
# 3: E
Notice that the second file only returns 1 column while I was expecting 3. It seems to be ignoring the sep provided and guessing based on the remaining rows. Here are some other "files", one of which also does not work, and one that does:
V3 <- c("A;B;C", ";D", "E")
V4 <- c("A;B;C", "D", ";E")
fread(paste(V3, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
# V1
# 1: A;B;C
# 2: ;D
# 3: E
fread(paste(V4, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
# V1 V2 V3
# 1: A B C
# 2: D
# 3: E
I tried specifying colClasses:
fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, colClasses = list(character = 1:3))
# Error in fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, :
# Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]
And then tried setting skip = 0 (which works):
fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
# V1 V2 V3
# 1: A B C
# 2: D
# 3: E
However, I don't want to set skip = 0 because then it doesn't seem to work if a sep value is not found in the first row:
V5 <- c("D", "E", "A;B;C")
fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
# V1
# 1: D
# 2: E
# 3: A;B;C
fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
# V1 V2 V3
# 1: D
# 2: E
# 3: A B C
Two questions:
- Should
fread be ignoring a manually specified sep value?
- The documentation says that
skip defaults to 0, but formals(fread)$skip returns [1] "__auto__". Should the documentation be updated to explain what "__auto__" represents?
sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 17.10
#
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#
# locale:
# [1] LC_CTYPE=en_IN.UTF-8 LC_NUMERIC=C LC_TIME=en_IN.UTF-8
# [4] LC_COLLATE=en_IN.UTF-8 LC_MONETARY=en_IN.UTF-8 LC_MESSAGES=en_IN.UTF-8
# [7] LC_PAPER=en_IN.UTF-8 LC_NAME=C LC_ADDRESS=C
# [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] data.table_1.10.5
#
# loaded via a namespace (and not attached):
# [1] compiler_3.4.2 tools_3.4.2 yaml_2.1.17
Here's a minimal example of some small files that I was trying to read with
fread:Notice that the second file only returns 1 column while I was expecting 3. It seems to be ignoring the
sepprovided and guessing based on the remaining rows. Here are some other "files", one of which also does not work, and one that does:I tried specifying
colClasses:And then tried setting
skip = 0(which works):However, I don't want to set
skip = 0because then it doesn't seem to work if asepvalue is not found in the first row:Two questions:
freadbe ignoring a manually specifiedsepvalue?skipdefaults to 0, butformals(fread)$skipreturns[1] "__auto__". Should the documentation be updated to explain what"__auto__"represents?