Skip to content

[Request] speed up as.ITime for string inputs #2156

@rossholmberg

Description

@rossholmberg

I've noticed that as.ITime is surprisingly slow when given character inputs (ie: of the format %H:%M:%S), and discovered that it's actually much faster to convert strings to ITime via an intermediate step. I'll show the time difference first:

# set up some inputs
set.seed( 1 )
x <- as.character( setattr(
    sample( seq_len( 24*60*59 ), 1E5, replace = TRUE ),
    "class",
    "ITime"
) )
head( x )
# [1] "06:15:58" "08:46:56" "13:31:10" "21:26:02" "04:45:35" "21:12:08"

Doing a direct conversion using as.ITime is very slow compared to using an intermediate as.chron.ITime step (the same is true using chron::times first:

microbenchmark::microbenchmark(
    direct = { direct <- as.ITime( x ) },
    twostep = { twostep <- as.ITime( as.chron.ITime( x ) ) },
    times = 10
)
# (I'll just show the median times here to make it easier to read)
# Unit: milliseconds
#     expr     median
#  direct   1808.9899
# twostep    115.9776

# check the output
identical( direct, twostep )
# [1] TRUE

Notice the significant speed increase (~15x), just by adding an intermediate step.

The same can be achieved in a less direct manner, by converting the numeric times values to integer, then converting to ITime with setattr instead of as.ITime. This way ("fivestep") gives basically the same speed increase, so I'm really just adding here as an option.

microbenchmark::microbenchmark(
    twostep = { twostep <- as.ITime( as.chron.ITime( x ) ) },
    fivestep = { 
        fivestep <- as.integer( round( as.chron.ITime( x ) * 86400 ) )
        setattr( fivestep, "class", "ITime" )
    },
    times = 100
)

# Unit: milliseconds
#     expr   median
#   twostep  122.2765
#  fivestep  119.9396

identical( direct, fivestep )
# [1] TRUE

My suggestion would be to build this into as.ITime.character, but my own attempts have failed to maintain the same reliability as the existing function.

NOTE: This does have the complication that is works perfectly when the format is "%H:%M:%S", but I believe the round may need to be floor if it's to maintain consistency when the input format is "%H:%M:%OS". Note round is still more appropriate in my opinion, and would maintain better consistency with chron::times, but as.ITime currently rounds down, so floor would maintain that for input format "%H:%M:%OS". Eg:

data.table::as.ITime( "12:00:00.99" )
# [1] "12:00:00"

chron::times( "12:00:00.99" )
# [1] 12:00:01

data.table::as.ITime( chron::times( "12:00:00.99" ) )
# [1] "12:00:01"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions