I am interested in count of rows from the join result
library(data.table)
set.seed(108)
n=2e5
d1=data.table(v1=1:n, v2=sample(5, replace=TRUE))
d2=data.table(v1=1:n, v2=sample(5, replace=TRUE))
d1[d2, .N, on="v2", allow.cartesian=TRUE]
optimization of .N actually makes the query lightweight because it does not need to allocate so many rows for the answer, thus in this particular case we could continue processing and not stop with
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
I am interested in count of rows from the join result
optimization of
.Nactually makes the query lightweight because it does not need to allocate so many rows for the answer, thus in this particular case we could continue processing and not stop with