data.table-style coalesce

Did a bunch of `coalesce`ing today and was sorely missing an efficient version.

@HughParsonage, would you be happy to add `hutils::coalesce` to `data.table`? I have the discussion in #2677 in mind...

I made tinkered with `hutils::coalesce` to come up with:

```
coalesce <- function(x, ...) {
  if (!any(na_idx <- is.na(x)) || missing(..1)) return(x)
  values <- list(...)
  
  lx <- length(x)
  lengths <- c(lx, vapply(values, length, FUN.VALUE = 0L))
  lengthsn1 <- lengths != 1L
  if (any(lengthsn1 & lengths != lx)) {
    wrong_len_i <- which(lengthsn1 & lengths != lx)
    stop("Argument ", wrong_len_i[1], " had length ", lengths[wrong_len_i[1]], ", ",
         "but length(x) = ", lx, ". ",
         "The only permissible lengths in ... are 1 or the length of `x` (", lx, ").")
  }
  
  typeof_x <- typeof(x)
  x_not_factor <- !inherits(x, what = 'factor')
  lv <- length(values)
  
  for (i in seq_len(lv)) {
    vi <- values[[i]]
    if (typeof(vi) != typeof_x) {
      stop("Argument ", i + 1L, " had type '", typeof(vi), "' but ",
           "typeof(x) was ", typeof_x, ". All types ",
           "in `...` must be the same type.")
    }
    
    if (inherits(vi, what = "factor") && x_not_factor) {
      stop("Argument ", i + 1L, " was a factor, but `x` was not. ",
           "All `...` must be the same type.")
    }
    
    if (lengthsn1[i + 1L]) {
      has_value_idx <- !is.na(vi[na_idx])
      x[na_idx][has_value_idx] <- vi[na_idx][has_value_idx]
      if (all(has_value_idx)) break
      na_idx[has_value_idx] = FALSE
    } else {
      if (is.na(vi)) next 
      else x[na_idx] <- vi
      break
    }
  }
  x
}
```

main difference being to skip running `anyNA` every iteration and instead focus on "whittling down" the `is.na(x)` vector

Benchmarked against `hmisc::coalesce` and it's hit or miss... maybe need more replications (function evaluation takes at most around 2 seconds so this is doable)? Or there's some extra optimization I'm missing...

Anyway, the same logic would probably be faster in C...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.table-style coalesce #3424

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

data.table-style coalesce #3424

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions