Currently data.table joins are consistent with base R.
This is somehow awkward for some queries.
library(data.table)
x = data.table(a=1:3, w=letters[1:3])
y = data.table(b=3:5, z=6:4)
x[y, on=c(a="b")]
# a w z
#1: 3 c 6
#2: 4 NA 5
#3: 5 NA 4
x[y, .(a, b), on=c(a="b")]
# a b
#1: 3 3
#2: 4 4
#3: 5 5
Join consistency to base R could be kept in merge.data.table method for base R merge generic, while the joins within [.data.table could be consistent to SQL - which does not impose limitation as base R. [.data.frame does not allow joins so it wouldn’t break consistency here.
Change would generally break the code which relies on invalid base R join behavior.
For reference SQL output from postgres:
#$`SELECT * FROM x RIGHT OUTER JOIN y ON x.a = y.b;`
# a w b z
#1: 3 c 3 6
#2: NA NA 4 5
#3: NA NA 5 4
#
#$`SELECT a, b FROM x RIGHT OUTER JOIN y ON x.a = y.b;`
# a b
#1: 3 3
#2: NA 4
#3: NA 5
Just to link related issues: #1700, #1761, #1469
Currently data.table joins are consistent with base R.
This is somehow awkward for some queries.
Join consistency to base R could be kept in
merge.data.tablemethod for base Rmergegeneric, while the joins within[.data.tablecould be consistent to SQL - which does not impose limitation as base R.[.data.framedoes not allow joins so it wouldn’t break consistency here.Change would generally break the code which relies on invalid base R join behavior.
For reference SQL output from postgres:
Just to link related issues: #1700, #1761, #1469