I'm working on a production system in which a basic data.table (nearly 200 columns and 6M rows) is generated at the beginning, then tens of scripts work on this data.table and compute derivative variables in-place and produce a finally column of values. To avoid deep copy of data.table, each time I use dt[TRUE] to make a shallow copy, then new derived columns are added. As documented, this does not avoid in-place modification of exiting columns in the original dt. Therefore I'm wondering if there's a way to protect existing columns from modification?
A basic workflow looks like this:
dt <- generate_data() # a big data.table
run("script-1.R")
run("script-2.R")
# ...
run("script-100.R")
where run() sys.source() a given script file in an sandbox environment.
In each script-n.R, the code looks like this:
ft <- dt[TRUE]
ft[, x1 := ..., by = col1]
ft[, x2 := ..., by = col2]
# ...
ft[, x := x1 + x2 * abs(x3 - x4)]
where all columns modified in ft are supposed not to exist in dt so that they are added without modifying any of pre-existing columns in dt.
I know the safest approach to this is copy dt all the time but it is simply too time consuming since the production is also time-critical. So the question is, Is there a way to protect all columns in dt while ft[, x1 := ...] only allows new columns to be added and prevents changing any columns in dt?
I'm working on a production system in which a basic data.table (nearly 200 columns and 6M rows) is generated at the beginning, then tens of scripts work on this data.table and compute derivative variables in-place and produce a finally column of values. To avoid deep copy of data.table, each time I use
dt[TRUE]to make a shallow copy, then new derived columns are added. As documented, this does not avoid in-place modification of exiting columns in the originaldt. Therefore I'm wondering if there's a way to protect existing columns from modification?A basic workflow looks like this:
where
run()sys.source()a given script file in an sandbox environment.In each
script-n.R, the code looks like this:where all columns modified in
ftare supposed not to exist indtso that they are added without modifying any of pre-existing columns indt.I know the safest approach to this is copy
dtall the time but it is simply too time consuming since the production is also time-critical. So the question is, Is there a way to protect all columns indtwhileft[, x1 := ...]only allows new columns to be added and prevents changing any columns indt?