Skip to content

dim.data.table and alloc.col are rather slow due to [[.data.frame calls #1433

@cooldome

Description

@cooldome

I was recently profiling a data.table heavy R script that involved lots of joins and linear scans on tables with number of rows from 35000 to 65000. I was surprised to see dim.data.table as top most contributor to the spent time. The reason was quickly identified, the problem in following implementation of dim.data.table is hidden in x[[1L]]:

dim.data.table <- function(x) {
    if (length(x)) c(length(x[[1L]]), length(x))
    else c(0L,0L)
    # TO DO: consider placing "dim" as an attibute updated on inserts. Saves this 'if'.
}

x[[1L]] is behind the scene is calling [[.data.frame, which rather awful in terms of performance:

> `[[.data.frame`
function (x, ..., exact = TRUE) 
{
    na <- nargs() - (!missing(exact))
    if (!all(names(sys.call()) %in% c("", "exact"))) 
        warning("named arguments other than 'exact' are discouraged")
    if (na < 3L) 
        (function(x, i, exact) if (is.matrix(i)) 
            as.matrix(x)[[i]]
        else .subset2(x, i, exact = exact))(x, ..., exact = exact)
    else {
        col <- .subset2(x, ..2, exact = exact)
        i <- if (is.character(..1)) 
            pmatch(..1, row.names(x), duplicates.ok = TRUE)
        else ..1
        col[[i, exact = exact]]
    }
}

The second top most contributor was alloc.col that suffered from the same problem:

alloc.col <- function(DT, n=getOption("datatable.alloccol"), verbose=getOption("datatable.verbose"))
{
...   
for (i in seq_along(ans)) {
        # clear the same excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890).
        setattr(ans[[i]],"names",NULL)
        setattr(ans[[i]],"dim",NULL)
        setattr(ans[[i]],"dimnames",NULL)
    }
...
}

Three [[.data.frame calls per column.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions