Skip to content

Scope for %plike% #3702

@KyleHaynes

Description

@KyleHaynes

Just reading the new dev notes and noticed #3333. I was going to actually feature request %likep% (would make sense to conform to %plike%) the other day, but decided against it (thought maybe the consensus was that less convenience wrappers were more ideal for data.table. Any particular reason why data.table can't incorporate another, leveraging the perl = TRUE argument?

Often you get considerable speed improvements, and a bunch of other features / behaviors

# Following packages required .
# install.packages(c("stringi", "microbenchmark")

# load data.table.
library(data.table)

# Create a data.table of 100,000 random strings (20 chars in length).
DT = data.table(x = stringi::stri_rand_strings(100000, 20))

# Define a trivial regex pattern.
regex_pattern = "car|blah|far|nah"

# Create an alternative to %like% that sets `perl = TRUE`.
`%likep%` = function (vector, pattern) {
    if (is.factor(vector)) {
        as.integer(vector) %in% grep(pattern, levels(vector), perl = TRUE)
    }
    else {
        grepl(pattern, vector, perl = TRUE)
    }
}

# Microbenchmark the results to demonstrate speed improvements.
microbenchmark::microbenchmark(like = {(DT[x %like% regex_pattern])}, likep = (DT[x %likep% regex_pattern]))
# Unit: milliseconds
#   expr     min       lq     mean   median       uq      max neval
#   like 84.1235 86.56265 91.51547 87.74410 91.16710 159.6292   100
#  likep 16.0932 16.64750 17.81476 16.95985 17.82195  34.1415   100

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions