Currently, dt[, (cols) := list(...), by = group] seems to silently recycles list(...) when replacing values of cols. If length(list) < length(cols), then list is recyled; if length(list) > length(cols) then redundant elements in list are silently dropped, as demonstrated below:
When by = group is absent, the lengths are checked:
library(data.table)
dt <- data.table(id = 1:10)
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20)]
#> Error in `[.data.table`(dt, , `:=`((xcols), list(10, 20))): Supplied 3 columns to be assigned 2 items. Please see NEWS for v1.12.2.
However, if by = group is used, list is recycled:
library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 2 10 20 10
#> 2: 2 2 10 20 10
#> 3: 3 2 10 20 10
#> 4: 4 2 10 20 10
#> 5: 5 2 10 20 10
#> 6: 6 2 10 20 10
#> 7: 7 1 10 20 10
#> 8: 8 2 10 20 10
#> 9: 9 2 10 20 10
#> 10: 10 2 10 20 10
library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(40, 30, 20, 10), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 1 40 30 20
#> 2: 2 1 40 30 20
#> 3: 3 2 40 30 20
#> 4: 4 2 40 30 20
#> 5: 5 2 40 30 20
#> 6: 6 1 40 30 20
#> 7: 7 1 40 30 20
#> 8: 8 2 40 30 20
#> 9: 9 2 40 30 20
#> 10: 10 1 40 30 20
Personally, the recycling behavior is almost always unwanted. If it occurs, it is mostly something wrong with my code.
Consider the following example where list(...) is produced by lapply(.SD, ...). If the function is inlined and a bit complicated, one often forgets to write .SDcols.
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
x / sd(x)
}), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 1 0.2672612 2.7645427 3.2041655
#> 2: 2 1 0.5345225 1.3098014 2.4955128
#> 3: 3 1 0.8017837 1.9576795 2.3071378
#> 4: 4 2 2.3421602 1.5214175 4.9351189
#> 5: 5 1 1.3363062 0.2973764 2.3618854
#> 6: 6 2 3.5132403 2.3907258 3.5168344
#> 7: 7 2 4.0987803 0.6538253 2.7005050
#> 8: 8 2 4.6843204 0.1117471 2.9490603
#> 9: 9 1 2.4053512 0.9474491 1.0415680
#> 10: 10 1 2.6726124 2.7578116 0.5299108
Undesired/incorrect results are silently produced. The following are the correct results with .SDcols added.
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
x / sd(x)
}), by = group, .SDcols = xcols]
dt
#> id group x1 x2 x3
#> 1: 1 1 2.7645427 3.2041655 2.5018371
#> 2: 2 1 1.3098014 2.4955128 2.3440794
#> 3: 3 1 1.9576795 2.3071378 1.7943807
#> 4: 4 2 1.5214175 4.9351189 2.9399476
#> 5: 5 1 0.2973764 2.3618854 0.0639438
#> 6: 6 2 2.3907258 3.5168344 1.7658739
#> 7: 7 2 0.6538253 2.7005050 2.8031711
#> 8: 8 2 0.1117471 2.9490603 0.7998165
#> 9: 9 1 0.9474491 1.0415680 0.8266013
#> 10: 10 1 2.7578116 0.5299108 0.6017398
I suggest that list(...) recycling should be consistent with the behavior data.table has already adopted with row recycling: only accepting zero, one, or .N elements.
Currently,
dt[, (cols) := list(...), by = group]seems to silently recycleslist(...)when replacing values ofcols. Iflength(list) < length(cols), thenlistis recyled; iflength(list) > length(cols)then redundant elements inlistare silently dropped, as demonstrated below:When
by = groupis absent, the lengths are checked:However, if
by = groupis used,listis recycled:Personally, the recycling behavior is almost always unwanted. If it occurs, it is mostly something wrong with my code.
Consider the following example where
list(...)is produced bylapply(.SD, ...). If the function is inlined and a bit complicated, one often forgets to write.SDcols.Undesired/incorrect results are silently produced. The following are the correct results with
.SDcolsadded.I suggest that
list(...)recycling should be consistent with the behaviordata.tablehas already adopted with row recycling: only accepting zero, one, or.Nelements.