Currently, keys and secondary indices are dropped, if any of the key columns is altered via a := assignment.
This is not necessary, however, since the key is still valid for all key columns before the assigned one.
Given this, we can better retain keys and secondary indices and improve data.table performance.
An example:
DT <- data.table(x1 = sample(1:10, 100, replace = TRUE),
x2 = sample(1:10, 100, replace = TRUE),
x3 = sample(1:10, 100, replace = TRUE),
y = rnorm(100))
setkey(DT, x1, x2, x3)
## DT has a key on x1, x2, x3
print(key(DT))
# [1] "x1" "x2" "x3"
## assigning to x3 drops the key completely:
DT[2, x3 := 1000]
print(key(DT))
# NULL
## even though the sorting of x1 and x2 is of course not affected:
print(length(data.table:::forderv(DT, by = c("x1", "x2"))) == 0)
# TRUE
Currently, keys and secondary indices are dropped, if any of the key columns is altered via a
:=assignment.This is not necessary, however, since the key is still valid for all key columns before the assigned one.
Given this, we can better retain keys and secondary indices and improve data.table performance.
An example: