Skip to content

memory allocation problems: "'Realloc' could not re-allocate memory" #2777

@jsams

Description

@jsams

I have a data.table with a large number of rows bumping up against the MAXINT row limit. Trying to reduce the table to be a unique on the key is trying to allocate way too much memory.

example

> affils = readRDS(file=sprintf(ALL_DATA, 'affils_all.rds'))
> setDT(affils)
> sapply(affils, class)
$fan_id
[1] "integer"

$contact_id
[1] "integer"

$is_first
[1] "integer"

$is_second
[1] "integer"

$created_at
[1] "POSIXct" "POSIXt" 

> nrow(affils)
[1] 2127968526
> key(affils)
[1] "fan_id"     "contact_id"
> affils = affils[, .(is_first=max(is_first), is_second=max(is_second),
+                     created_at=min(created_at)),
+                 keyby=.(fan_id, contact_id)]
Error in uniqlist(byval) : 
  'Realloc' could not re-allocate memory (18446744065119617024 bytes)

Enter a frame number, or 0 to exit   

1: affils[, .(is_first = max(is_first), is_second = max(is_second), created_at = min(created_at)), keyby = .(fan_id, contact_id)]
2: `[.data.table`(affils, , .(is_first = max(is_first), is_second = max(is_second), created_at = min(created_at)), keyby = .(fan_id, contact_id))
3: uniqlist(byval)

I really don't think that it should be necessary to try to allocate that much memory to run that operation. There are 10s of millions of unique users and 10s of millions of unique contacts. It is probable that there are in fact no duplicate values in that table. I was really just running this as a sanity check.

Potentially related, I think I am seeing memory count overflows (i.e. attempts to allocate a negative amount of memory) in rbind and/or forderv. Unfortunately I don't have the output as I had to kill the screen window those R sessions were in. But basically I had structurally similar tables as above, but with much fewer rows and was rbind'ing them and then running that same unique operation as above. I had checked that the total number of rows was less than MAXINT. I do have part of the error from my search history:

failed to realloc working memory stack data.table

None of this should be constrained by the amount of memory on the machine, as the process was only using about 10-15% of total available RAM on the machine.

sessionInfo

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       lubridate_1.7.1     fasttime_1.0-2      data.table_1.10.4-3

loaded via a namespace (and not attached):
[1] compiler_3.4.2  magrittr_1.5    tools_3.4.2     Rcpp_0.12.14    stringi_1.1.6   grid_3.4.2      stringr_1.2.0   lattice_0.20-35

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions