Working with proprietary data so was a bit tricky creating a reproducible example but think this works.
X <- setDT(structure(list(id = c(6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L), id_round = c(197801L, 199405L,
199501L, 197901L, 197905L, 198001L, 198005L, 198101L, 198105L,
198201L, 198205L, 198301L, 198305L, 198401L), field = c(NA, NA,
NA, "medicine", "medicine", "medicine", "medicine", "medicine",
"medicine", "medicine", "medicine", "medicine", "medicine", "medicine"
)), class = c("data.table", "data.frame"
), sorted = "id"))
Y <- setDT(structure(list(id = c(6456372L, 6456345L, 6456356L), id_round = c(197705L,
197905L, 201705L), field = c("medicine", "teaching", "health"
), prio = c(6L, 1L, 10L)), class = c("data.table",
"data.frame"), sorted = c("id_round",
"id", "prio", "field")))
X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI]
id id_round field V1 V2
1: 6456372 197705 medicine 197901 197705
2: 6456345 197905 teaching NA 197905
3: 6456356 201705 health NA 201705
So everything seems to work fine, but these results are supposed to be merged back into the main data set Y and here is where I run in to trouble. It does not merge and moreover I cannot subset by id anymore:
> X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI][id == 6456372]
Empty data.table (0 rows and 5 cols): id,id_round,field,V1,V2
Expecting to find a match here of course. The strange thing is that it works if I drop by=.EACHI or if I drop the last key column "prio":
> X[Y, on = .(id, id_round > id_round, field), .(id, field, x.id_round[1], i.id_round[1])][id == 6456372]
id field V3 V4
1: 6456372 medicine 197901 197705
2: 6456372 medicine 197901 197705
3: 6456372 medicine 197901 197705
4: 6456372 medicine 197901 197705
5: 6456372 medicine 197901 197705
6: 6456372 medicine 197901 197705
7: 6456372 medicine 197901 197705
8: 6456372 medicine 197901 197705
9: 6456372 medicine 197901 197705
10: 6456372 medicine 197901 197705
11: 6456372 medicine 197901 197705
> X[Y[, .(id, id_round, field)], on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI][id == 6456372]
id id_round field V1 V2
1: 6456372 197705 medicine 197901 197705
Y is keyed by "prio" but it is not included in the join. It seems to be related to the id number's relation to the other numbers, cause if I change the number to 6456344 or anything lower I get the expected results.
Running latest dev:
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.7 colorout_1.2-2
loaded via a namespace (and not attached):
[1] compiler_4.0.4 jsonlite_1.7.2 rlang_0.4.10
Working with proprietary data so was a bit tricky creating a reproducible example but think this works.
So everything seems to work fine, but these results are supposed to be merged back into the main data set Y and here is where I run in to trouble. It does not merge and moreover I cannot subset by id anymore:
Expecting to find a match here of course. The strange thing is that it works if I drop
by=.EACHIor if I drop the last key column "prio":Y is keyed by "prio" but it is not included in the join. It seems to be related to the id number's relation to the other numbers, cause if I change the number to 6456344 or anything lower I get the expected results.
Running latest dev: