This arose on SO recently in connection with dplyr but I was wondering about it with reference to data.table.
This data.table code works to operate only on the numeric columns:
library(data.table)
iris.dt <- as.data.table(iris)
iris.dt[, .SD / rowSums(.SD), .SDcols = sapply(iris.dt, is.numeric)]
but it requires that we know the name of the table in order to specify it in the .SDcols portion; however, often we don't know this when writing cascades like DT[...][...]. I would have liked to write:
iris.dt[, .SD / rowSums(.SD), .SDcols = sapply(.SD, is.numeric)]
where the last .SD (the one in the third argument) refers to the entire table and the others are modified by .SDcols. Is there a good way to do this or is there something already available for this? If not, I suggest adding the possibility of referencing .SD in the .SDcols argument.
Also what I really want is to perform .SD / rowMeans(.SD) on the numeric rows but not drop the non-numeric columns and actually neither of these "solutions" does that.
This one does preserve the non-numeric columns but it seems ugly and verbose:
iris.dt[, { is.num <- sapply(.SD, is.numeric)
SDnum <- .SD[, is.num, with = FALSE]
replace(.SD, is.num, SDnum / rowMeans(SDnum))
} ]
This also works but does not seem very data.table like:
iris.dt <- as.data.table(iris)
nums <- which(sapply(iris.dt, is.numeric))
iris.num <- iris.dt[, nums, with = FALSE]
iris.dt[, nums] <- iris.num / rowMeans(iris.num)
This arose on SO recently in connection with dplyr but I was wondering about it with reference to data.table.
This data.table code works to operate only on the numeric columns:
but it requires that we know the name of the table in order to specify it in the .SDcols portion; however, often we don't know this when writing cascades like DT[...][...]. I would have liked to write:
where the last .SD (the one in the third argument) refers to the entire table and the others are modified by .SDcols. Is there a good way to do this or is there something already available for this? If not, I suggest adding the possibility of referencing .SD in the .SDcols argument.
Also what I really want is to perform .SD / rowMeans(.SD) on the numeric rows but not drop the non-numeric columns and actually neither of these "solutions" does that.
This one does preserve the non-numeric columns but it seems ugly and verbose:
This also works but does not seem very data.table like: