When I use := together with .SD, .SDcols and get(), for example,
dt[, z := ncol(.SD) + get("y"), .SDcols = "x"]
The first time is ok, but whenever z already exists, the code above will produce incorrect results since .SD includes both x and z, which should be x only, as restricted by .SDcols.
The following code is a minimal example and demonstrates the problem:
library(data.table)
dt <- data.table(x = seq(1, 10), y = seq(10, 1))
dt[, z := ncol(.SD) + y, .SDcols = "x"]
dt
dt[, z := ncol(.SD) + get("y"), .SDcols = "x"]
dt
dt[, z := {
str(.SD)
ncol(.SD) + get("y")
}, .SDcols = "x"]
dt[, z := NULL]
dt[, z := {
str(.SD)
ncol(.SD) + get("y", inherits = FALSE)
}, .SDcols = "x"]
The output looks like
> library(data.table)
> dt <- data.table(x = seq(1, 10), y = seq(10, 1))
> dt
x y
1: 1 10
2: 2 9
3: 3 8
4: 4 7
5: 5 6
6: 6 5
7: 7 4
8: 8 3
9: 9 2
10: 10 1
> dt[, z := ncol(.SD) + y, .SDcols = "x"]
> dt
x y z
1: 1 10 11
2: 2 9 10
3: 3 8 9
4: 4 7 8
5: 5 6 7
6: 6 5 6
7: 7 4 5
8: 8 3 4
9: 9 2 3
10: 10 1 2
> dt[, z := ncol(.SD) + get("y"), .SDcols = "x"]
> dt
x y z
1: 1 10 12
2: 2 9 11
3: 3 8 10
4: 4 7 9
5: 5 6 8
6: 6 5 7
7: 7 4 6
8: 8 3 5
9: 9 2 4
10: 10 1 3
> dt[, z := {
+ str(.SD)
+ ncol(.SD) + get("y")
+ }, .SDcols = "x"]
Classes ‘data.table’ and 'data.frame': 10 obs. of 2 variables:
$ x: int 1 2 3 4 5 6 7 8 9 10
$ z: int 12 11 10 9 8 7 6 5 4 3
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, ".data.table.locked")= logi TRUE
> dt[, z := NULL]
> dt[, z := {
+ str(.SD)
+ ncol(.SD) + get("y", inherits = FALSE)
+ }, .SDcols = "x"]
Classes ‘data.table’ and 'data.frame': 10 obs. of 1 variable:
$ x: int 1 2 3 4 5 6 7 8 9 10
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, ".data.table.locked")= logi TRUE
When I use
:=together with.SD,.SDcolsandget(), for example,The first time is ok, but whenever
zalready exists, the code above will produce incorrect results since.SDincludes bothxandz, which should bexonly, as restricted by.SDcols.The following code is a minimal example and demonstrates the problem:
The output looks like