Skip to content

Column type error when combining max(na.rm = TRUE) with any(is.na()) in grouped operation #6608

@dcolombara

Description

@dcolombara

Description

I've encountered an error when combining max(..., na.rm = TRUE) with any(is.na(...)) in the same grouped operation. The operation fails with a type error, but works fine when na.rm = FALSE.

Expected behavior

The second operation below should work the same as the first one, just handling NAs differently via na.rm = TRUE.

Observed behavior

The operation fails with a type error suggesting column type inconsistency across groups, even though all input columns are integers.

Column 1 of result for group 3 is type 'double' but expecting type 'integer'. Column types must be consistent for each group.

Minimal reproducible example

Create sample data

library(data.table)

dt_test = data.table(
  hh_id = rep(1:3, each = 3),
  age = c(65L, 65L, NA_integer_,
          45L, 45L, 45L,
          NA_integer_, NA_integer_, NA_integer_), 
  disability = sample(0L:1L, 9, replace = T)
)

Works (when na.rm = FALSE)

dt_test[, .(
  age = max(age, na.rm = FALSE)
  ,any_disability = max(disability, na.rm = FALSE)
  ,any_disabilityNA = any(is.na(disability))
), by = hh_id]

Fails (when set na.rm = TRUE)

dt_test[, .(
  age = max(age, na.rm = TRUE)
  ,any_disability = max(disability, na.rm = TRUE)
  ,any_disabilityNA = any(is.na(disability))
), by = hh_id, verbose = TRUE]

Works (when comment out the any(...) )

dt_test[, .(
  age = max(age, na.rm = TRUE)
  , any_disability = max(disability, na.rm = TRUE)
  #, any_disabilityNA = any(is.na(disability))
), by = hh_id]

## Works (when comment out the max(age) ) 
dt_test[, .(
  # age = max(age, na.rm = TRUE),
  any_disability = max(disability, na.rm = TRUE)
  , any_disabilityNA = any(is.na(disability))
), by = hh_id]

Output of sessionInfo()

> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.16.2

loaded via a namespace (and not attached):
[1] compiler_4.4.1 tools_4.4.1  

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions