Skip to content

calculating time difference by group might get the units messed up #3694

@oliver-oliver

Description

@oliver-oliver

# Minimal reproducible example

The objective is to calculate the time between events grouped by some id. Here is an example:

library(data.table)
library(lubridate)

dt <- data.table(id = c(1,1:3), 
                 start = c("2015-01-01 12:00:00", "2015-12-01 12:00:00", "2019-01-01 12:00:00", NA),
                 end = c("2016-01-01 12:00:01", "2016-01-01 12:00:01", "2019-01-01 12:00:01", "2019-01-01 12:00:02"))

dt[, start := ymd_hms(start)]
dt[, end := ymd_hms(end)]

dt[, time_diff_1 := min(end) - max(start), by = .(id)]
dt[, time_diff_2 := end - start]

which results in:

   id               start                 end   time_diff_1   time_diff_2
1:  1 2015-01-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 31536001 secs
2:  1 2015-12-01 12:00:00 2016-01-01 12:00:01 31.00001 secs  2678401 secs
3:  2 2019-01-01 12:00:00 2019-01-01 12:00:01  1.00000 secs        1 secs
4:  3                <NA> 2019-01-01 12:00:02       NA secs       NA secs

Both columns time_diff_1 and time_diff_2 display the time difference in seconds. However the time_diff_1 which resulted from the grouped calculation mixed up the units. The result for id == 1 is 31 days and one second. It seems as if the units were choosen automatically by group and then gotten overwritten.

To prevent this one can use difftime(). However I think there is room for improvment, e.g. a warning message when units do not match for different groups.

# Output of sessionInfo()

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.6.0      data.table_1.10.4    RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
[1] compiler_3.4.0   magrittr_1.5     RevoUtils_10.0.4 tools_3.4.0      stringi_1.1.5    stringr_1.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions