Just came across some inconsistencies in allow.cartesian:
require(data.table) # v1.9.5, commit 1813
x = data.table(a=rep(1:2, each=2), b=10, key="a")
# a b
#1: 1 10
#2: 1 10
#3: 2 10
#4: 2 10
y = data.table(a=rep(1L, 4), b=5:6, key="a")
# a b
#1: 1 5
#2: 1 6
#3: 1 5
#4: 1 6
y[x]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
# Join results in 10 rows; more than 8 = nrow(x)+nrow(i). Check for duplicate key values in i ...
y[x, nomatch=0L]
# a b i.b
#1: 1 5 10
#2: 1 6 10
#3: 1 5 10
#4: 1 6 10
#5: 1 5 10
#6: 1 6 10
#7: 1 5 10
#8: 1 6 10
?data.table explains allow.cartesian as:
FALSE prevents joins that would result in more than max(nrow(x),nrow(i)) rows.
Both joins results in more than max(nrow(x), nrow(i)) rows.. nomatch=NA results in 10, and nomatch=0L results in 8. So why is the second one working fine? And why is the error message mentioning about join rows being larger than nrow(x) + nrow(i)??
Additionally, if we are to rename allow.cartesian as allow.i.dups (#914), then the error should occur irrespective of the number of rows, and only depending on whether i has duplicates on it's key columns.
Just came across some inconsistencies in
allow.cartesian:?data.tableexplainsallow.cartesianas:Both joins results in more than
max(nrow(x), nrow(i))rows..nomatch=NAresults in 10, andnomatch=0Lresults in 8. So why is the second one working fine? And why is the error message mentioning about join rows being larger thannrow(x) + nrow(i)??Additionally, if we are to rename
allow.cartesianasallow.i.dups(#914), then the error should occur irrespective of the number of rows, and only depending on whetherihas duplicates on it's key columns.