Skip to content

Reference original table when specifying .SDcols #1786

@ggrothendieck

Description

@ggrothendieck

This arose on SO recently in connection with dplyr but I was wondering about it with reference to data.table.

This data.table code works to operate only on the numeric columns:

library(data.table)
iris.dt <- as.data.table(iris)
iris.dt[, .SD / rowSums(.SD), .SDcols = sapply(iris.dt, is.numeric)]

but it requires that we know the name of the table in order to specify it in the .SDcols portion; however, often we don't know this when writing cascades like DT[...][...]. I would have liked to write:

iris.dt[, .SD / rowSums(.SD), .SDcols = sapply(.SD, is.numeric)]

where the last .SD (the one in the third argument) refers to the entire table and the others are modified by .SDcols. Is there a good way to do this or is there something already available for this? If not, I suggest adding the possibility of referencing .SD in the .SDcols argument.

Also what I really want is to perform .SD / rowMeans(.SD) on the numeric rows but not drop the non-numeric columns and actually neither of these "solutions" does that.

This one does preserve the non-numeric columns but it seems ugly and verbose:

iris.dt[, { is.num <- sapply(.SD, is.numeric)
            SDnum <- .SD[, is.num, with = FALSE]
            replace(.SD, is.num, SDnum / rowMeans(SDnum))
          } ]

This also works but does not seem very data.table like:

iris.dt <- as.data.table(iris)
nums <- which(sapply(iris.dt, is.numeric))
iris.num <- iris.dt[, nums, with = FALSE]
iris.dt[, nums] <- iris.num / rowMeans(iris.num)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions