Skip to content

Regression in unique.data.table() as of data.table 1.12.0 #3332

@jameslamb

Description

@jameslamb

# Minimal reproducible example

I believe there's been a regression in data.table in version 1.12.0 relative to 1.11.8. TL;DR

I think the implementation of unique.data.table() has changed and that the new implementation doesn't support complex types like lists in columns.

Some users of uptasticsearch have reported receiving an error message from our code that looks like this:

Error in forderv(x, by = by, sort = FALSE, retGrp = TRUE) :
Column 2 of by= (2) is type 'list', not yet supported

After investigating tonight, I found the source of the issue and can reproduce it. I believe the behavior of unique() has changed and I'd consider that change a regression.

On 1.12.0:

someDT <- data.table::data.table(
    col1 = 1:2,
    col2 = list(list(TRUE, FALSE), list(FALSE, TRUE))
)
unique(someDT)

Raises error

Error in forderv(x, by = by, sort = FALSE, retGrp = TRUE) :
Column 2 of by= (2) is type 'list', not yet supported

To downgrade to the previous release, I ran the following from the command line:

Rscript -e "remove.packages('data.table')"
wget http://cran.rstudio.com/src/contrib/Archive/data.table/data.table_1.11.8.tar.gz
R CMD INSTALL data.table_1.11.8.tar.gz

Once I had v 1.11.8 installed, I re-ran the R code above

someDT <- data.table::data.table(
    col1 = 1:2,
    col2 = list(list(TRUE, FALSE), list(FALSE, TRUE))
)
unique(someDT)

Works as expected and returns:

   col1   col2
1:    1 <list>
2:    2 <list>

So I went to the blame for unique.data.table() to see what had . changed. Looked like no substantive changes have been made between 1.11.8 and now, so then I thought to look at the blame for forderv().

I didn't see anything meaningful in the blame for forderv() either.

I decided to try one more thing...searching for the text "not yet supported" (from the error message). That led me to forder.c, whose blame led me to #3124.

As far as I can tell, this PR is the source of the problem above. There is no description on the PR so I'm not sure if this is an unintended side effect or a known regression that will be fixed in a future release of data.table.

# Output of sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 yaml_2.2.0 data.table_1.12.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions