Need a way to protect data from shallow copy

I'm working on a production system in which a basic data.table (nearly 200 columns and 6M rows) is generated at the beginning, then tens of scripts work on this data.table and compute derivative variables in-place and produce a finally column of values. To avoid deep copy of data.table, each time I use `dt[TRUE]` to make a shallow copy, then new derived columns are added. As documented, this does not avoid in-place modification of exiting columns in the original `dt`. Therefore I'm wondering if there's a way to protect existing columns from modification?

A basic workflow looks like this:

```r
dt <- generate_data() # a big data.table

run("script-1.R")
run("script-2.R")
# ...
run("script-100.R")
```

where `run()` `sys.source()` a given script file in an sandbox environment.

In each `script-n.R`, the code looks like this:

```r
ft <- dt[TRUE]
ft[, x1 := ..., by = col1]
ft[, x2 := ..., by = col2]
# ...
ft[, x := x1 + x2 * abs(x3 - x4)]
```

where all columns modified in `ft` are supposed not to exist in `dt` so that they are added without modifying any of pre-existing columns in `dt`.

I know the safest approach to this is copy `dt` all the time but it is simply too time consuming since the production is also time-critical. So the question is, Is there a way to protect all columns in `dt` while `ft[, x1 := ...]` only allows new columns to be added and prevents changing any columns in `dt`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a way to protect data from shallow copy #2277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need a way to protect data from shallow copy #2277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions