Original SO post: http://stackoverflow.com/questions/25145112/inconsistent-data-table-assignment-by-reference-behaviour
When assigning by reference with a data.table using a column from a second data.table, the results are inconsistent. When there are no matches by the key columns of both data.tables, it appears the assigment expression y := y is totally ignored - not even NAs are returned.
library(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
print(dt1[dt2, y := y])
## id x # Would have also expected column: y
## 1: 1 3 # NA
## 2: 2 4 # NA
However, when there is a partial match, non-matching columns have a placeholder NA.
dt2[, id := 2:3]
print(dt1[dt2, y := y])
## id x y
## 1: 1 3 NA # <-- placeholder NA here
## 2: 2 4 5
This wreaks havoc on later code that assumes a y column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.
Is there an elegant way around this inconsistency?
Original SO post: http://stackoverflow.com/questions/25145112/inconsistent-data-table-assignment-by-reference-behaviour
When assigning by reference with a
data.tableusing a column from a seconddata.table, the results are inconsistent. When there are no matches by the key columns of bothdata.tables, it appears the assigment expressiony := yis totally ignored - not evenNAs are returned.However, when there is a partial match, non-matching columns have a placeholder
NA.This wreaks havoc on later code that assumes a
ycolumn exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.Is there an elegant way around this inconsistency?