Skip to content

transform is ~100x slower on data.table than on data.frame #5492 #2

@DorisAmoakohene

Description

@DorisAmoakohene

@tdhock
similar to the other issue I sent,
I am running this performance test and I am encountering this error which is I am not able to install of the commit ids

This is the link to the PR that fixes the issue (Rdatatable/data.table#5493) and the link to the issue (Rdatatable/data.table#5492)

Error message

Error in value[[3L]](cond) : 
  Error in revparse_single(object, branch): Error in 'git2r_revparse_single': Requested object could not be found

 when trying to checkout 93ce3ce1373bf733ebd2036e2883d2ffe377ab58

This is the code I am running


atime.list.5493 <- atime::atime_versions(
pkg.path=tdir,
pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
      pkg_find_replace <- function(glob, FIND, REPLACE){
        atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
      }
      Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
      Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
      new.Package_ <- paste0(Package_, "_", sha)
      pkg_find_replace(
        "DESCRIPTION", 
        paste0("Package:\\s+", old.Package),
        paste("Package:", new.Package))
      pkg_find_replace(
        file.path("src","Makevars.*in"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        sprintf('packageVersion\\("%s"\\)', old.Package),
        sprintf('packageVersion\\("%s"\\)', new.Package))
      pkg_find_replace(
        file.path("src", "init.c"),
        paste0("R_init_", Package_regex),
        paste0("R_init_", gsub("[.]", "_", new.Package_)))
      pkg_find_replace(
        "NAMESPACE",
        sprintf('useDynLib\\("?%s"?', Package_regex),
        paste0('useDynLib(', new.Package_))
    },

  N=10^seq(1,20),
  setup={ 
    set.seed(108)
    df <- data.frame(x = runif(N))
    dt <- as.data.table(df)
    
  },

  expr=data.table:::`[.data.table`(transform(dt, y = round(x))),
"Before"="93ce3ce1373bf733ebd2036e2883d2ffe377ab58",#fIRST COMMIT IN THE PR(https://github.com/Rdatatable/data.table/pull/5493/commits)
  "Regression"="0bacebc9b813d84b9b267e0928b5fd7c7ea126fb", #PARENT OF THE LAST OF THE PR THAT BFIXES THE ISSUE (https://github.com/Rdatatable/data.table/commit/1e03fe7b890e63da9651d997ea52548c90b3ae32)
  "Fixed"="1e03fe7b890e63da9651d997ea52548c90b3ae32")# LAST COMMIT IN THE PR THAT FIXES THE ISSUE(https://github.com/Rdatatable/data.table/pull/5493/commits)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions