Skip to content

segfault/unexpected results with fread and colClasses #3143

@kszela24

Description

@kszela24

The gist of the issue is that when attempting to cast a character column to integer via colClasses, fread will error out under some conditions and produce unexpected results in others (instead of quietly returning the uncasted column). Main examples were all run on 1.11.8. Code was originally found to work on 1.10.4 and is labeled as the expected result.

Not sure if each of these belong as separate issues, since they seem to be somewhat related. If these deserve separate issues please let me know and I'll be happy to create them.

# Minimal reproducible examples
[1.11.8] Segfault is created when casting character column to integer when the character column should be returned:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.1)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
 *** caught segfault ***
address 0x14576766a, cause 'memory not mapped'

Traceback:
 1: data.table::fread("~/test_dt.csv", colClasses = c(b = "integer"))

[1.10.4] Expected result (a is numeric, b is character):

     a b
1: 1.0 1
2: 2.0 2
3: 3.0 E
4: 4.0 4
5: 5.1 5

[1.11.8] When decimals can be cast to integers, no segfault occurs, but the character values of the cast column are all the same. In this case, a is read in as integer, b as character:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.0)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
   a b
1: 1 5
2: 2 5
3: 3 5
4: 4 5
5: 5 5

[1.10.4] Expected result (a is integer, b is character):

   a b
1: 1 1
2: 2 2
3: 3 E
4: 4 4
5: 5 5

[1.11.8] Above is the same behavior as if column a did not exist (column b is output as character):

testDT <- data.table::data.table(b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
   b
1: 5
2: 5
3: 5
4: 5
5: 5

[1.10.4] Expected result (b is character):

   b
1: 1
2: 2
3: E
4: 4
5: 5

[1.11.8] And lastly, if I additionally cast the integer column back to numeric via colClasses, the resulting column contains mostly empty strings. In this case, a is numeric, b is character:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.0)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(a = "numeric"
                                             , b = "integer"))
readDT
   a b
1: 1  
2: 2  
3: 3  
4: 4  
5: 5 5

[1.10.4] Expected result (a is numeric, b is character):

   a b
1: 1 1
2: 2 2
3: 3 E
4: 4 4
5: 5 5

# Output of sessionInfo()

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.8

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4    yaml_2.1.18

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions