Skip to content

Should recycle with remainder in data.table() throw warning or error? #4253

@ToeKneeFan

Description

@ToeKneeFan

According to #362, recycling with remainder when using data.table() was changed to a warning for sake of consistency within data.table. I cannot access the R-Forge link, but I assume this is in reference to the original behavior of :=.

However, since data.table v1.12.2 (07 Apr 2019), recycling for vectors of length > 1 in := has thrown an error:

:= no longer recycles length>1 RHS vectors. There was a warning when recycling left a remainder but no warning when the LHS length was an exact multiple of the RHS length (the same behaviour as base R). Consistent feedback for several years has been that recycling is more often a bug. In rare cases where you need to recycle a length>1 vector, please use rep() explicitly. Single values are still recycled silently as before. Early warning was given in this tweet. The 774 CRAN and Bioconductor packages using data.table were tested and the maintainers of the 16 packages affected (2%) were consulted before going ahead, #3310. Upon agreement we went ahead. Many thanks to all those maintainers for already updating on CRAN, #3347.

Would it therefore make sense to regress the behavior of data.table() to throw an error when recycling with remainder? This is potentially script breaking, but this would make the behavior more consistent both within the data.table package as well as with data.frame(), and the reasons/scenarios for recycling with remainder in data.table() seem no more reasonable than those for :=. Making the recycling behavior the same as := is probably too drastic (error when recycling vectors of length > 1), but regressing to an error when recycling with remainder seems like a good compromise, especially since it would make the behavior consistent with data.frame() and the presumed motivation for changing the behavior to a warning no longer holds. As with :=, I suspect most use cases of recycling with remainder are bugs.

# Minimal reproducible example

library(data.table)

data.table(1:3, 1:5)
#    V1 V2
# 1:  1  1
# 2:  2  2
# 3:  3  3
# 4:  1  4
# 5:  2  5
# Warning message:
# In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
#   Item 1 has 3 rows but longest item has 5; recycled with remainder.

# Output of sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.6

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1    yaml_2.2.0     knitr_1.25     xfun_0.10

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions