Did a bunch of coalesceing today and was sorely missing an efficient version.
@HughParsonage, would you be happy to add hutils::coalesce to data.table? I have the discussion in #2677 in mind...
I made tinkered with hutils::coalesce to come up with:
coalesce <- function(x, ...) {
if (!any(na_idx <- is.na(x)) || missing(..1)) return(x)
values <- list(...)
lx <- length(x)
lengths <- c(lx, vapply(values, length, FUN.VALUE = 0L))
lengthsn1 <- lengths != 1L
if (any(lengthsn1 & lengths != lx)) {
wrong_len_i <- which(lengthsn1 & lengths != lx)
stop("Argument ", wrong_len_i[1], " had length ", lengths[wrong_len_i[1]], ", ",
"but length(x) = ", lx, ". ",
"The only permissible lengths in ... are 1 or the length of `x` (", lx, ").")
}
typeof_x <- typeof(x)
x_not_factor <- !inherits(x, what = 'factor')
lv <- length(values)
for (i in seq_len(lv)) {
vi <- values[[i]]
if (typeof(vi) != typeof_x) {
stop("Argument ", i + 1L, " had type '", typeof(vi), "' but ",
"typeof(x) was ", typeof_x, ". All types ",
"in `...` must be the same type.")
}
if (inherits(vi, what = "factor") && x_not_factor) {
stop("Argument ", i + 1L, " was a factor, but `x` was not. ",
"All `...` must be the same type.")
}
if (lengthsn1[i + 1L]) {
has_value_idx <- !is.na(vi[na_idx])
x[na_idx][has_value_idx] <- vi[na_idx][has_value_idx]
if (all(has_value_idx)) break
na_idx[has_value_idx] = FALSE
} else {
if (is.na(vi)) next
else x[na_idx] <- vi
break
}
}
x
}
main difference being to skip running anyNA every iteration and instead focus on "whittling down" the is.na(x) vector
Benchmarked against hmisc::coalesce and it's hit or miss... maybe need more replications (function evaluation takes at most around 2 seconds so this is doable)? Or there's some extra optimization I'm missing...
Anyway, the same logic would probably be faster in C...
Did a bunch of
coalesceing today and was sorely missing an efficient version.@HughParsonage, would you be happy to add
hutils::coalescetodata.table? I have the discussion in #2677 in mind...I made tinkered with
hutils::coalesceto come up with:main difference being to skip running
anyNAevery iteration and instead focus on "whittling down" theis.na(x)vectorBenchmarked against
hmisc::coalesceand it's hit or miss... maybe need more replications (function evaluation takes at most around 2 seconds so this is doable)? Or there's some extra optimization I'm missing...Anyway, the same logic would probably be faster in C...