# Minimal reproducible example
The objective is to calculate the time between events grouped by some id. Here is an example:
library(data.table)
library(lubridate)
dt <- data.table(id = c(1,1:3),
start = c("2015-01-01 12:00:00", "2015-12-01 12:00:00", "2019-01-01 12:00:00", NA),
end = c("2016-01-01 12:00:01", "2016-01-01 12:00:01", "2019-01-01 12:00:01", "2019-01-01 12:00:02"))
dt[, start := ymd_hms(start)]
dt[, end := ymd_hms(end)]
dt[, time_diff_1 := min(end) - max(start), by = .(id)]
dt[, time_diff_2 := end - start]
which results in:
id start end time_diff_1 time_diff_2
1: 1 2015-01-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 31536001 secs
2: 1 2015-12-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 2678401 secs
3: 2 2019-01-01 12:00:00 2019-01-01 12:00:01 1.00000 secs 1 secs
4: 3 <NA> 2019-01-01 12:00:02 NA secs NA secs
Both columns time_diff_1 and time_diff_2 display the time difference in seconds. However the time_diff_1 which resulted from the grouped calculation mixed up the units. The result for id == 1 is 31 days and one second. It seems as if the units were choosen automatically by group and then gotten overwritten.
To prevent this one can use difftime(). However I think there is room for improvment, e.g. a warning message when units do not match for different groups.
# Output of sessionInfo()
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.6.0 data.table_1.10.4 RevoUtilsMath_10.0.0
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 RevoUtils_10.0.4 tools_3.4.0 stringi_1.1.5 stringr_1.2.0
#Minimal reproducible exampleThe objective is to calculate the time between events grouped by some id. Here is an example:
which results in:
Both columns time_diff_1 and time_diff_2 display the time difference in seconds. However the time_diff_1 which resulted from the grouped calculation mixed up the units. The result for id == 1 is 31 days and one second. It seems as if the units were choosen automatically by group and then gotten overwritten.
To prevent this one can use
difftime(). However I think there is room for improvment, e.g. a warning message when units do not match for different groups.#Output of sessionInfo()