Skip to content

Need an easier way to in-place merge multiple columns #3184

@renkun-ken

Description

@renkun-ken

In-place merge in the form of dt1[dt2, x := y, on = .(col1, col2)] is useful when dt1 is very large. It also supports merging multiple columns from dt2 using `:=`(x1 = y1, x2 = y2). However, when I need to merge many columns from dt2 to dt1, it seems only possible to explicitly list all columns rather than dynamically determine the column names via a character vector like done with .SD, or otherwise I need to use meta-programming facilities to generate an expression and evaluate it.

One simple example is as follows. A practice use case is when dt1 and dt2 is very large and using merge will cause copy that is very slow and may exceed memory limit (which is exactly why in-place operations are introduced)

library(data.table)

d1 <- data.table(id = 1:10)
for (i in 1:10) {
  d1[, paste0("x", i) := rnorm(.N)]
}

d2 <- data.table(id = 3:6)
for (i in 1:5) {
  d2[, paste0("y", i) := rnorm(.N)]
}

d1[d2, paste0("z", 1:5) := list(y1, y2, y3, y4, y5), on = "id"]

Another similar problem is to in-place merge all columns of d2 without specifying source and target columns names.

Metadata

Metadata

Assignees

No one assigned

    Labels

    programmingparameterizing queries: get, mget, eval, envtests

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions