Skip to content

rbindlist() idcol returns garbage id for lists contain inequal length vector #3785

@shrektan

Description

@shrektan

I think it's a bug. The id is assigned based on the length of the sub-element's first vector. It should be the maximum of all the vectors of that sub-element.

library(data.table)
x <- 1:1000

# notice the last few TAGs
out1 <- lapply(x, function(.) {
  list(., 1:2, 2:3)
})
out1 <- rbindlist(out1, idcol = 'TAG')
out1
#>       TAG   V1 V2 V3
#>    1:   1    1  1  2
#>    2:   2    1  2  3
#>    3:   3    2  1  2
#>    4:   4    2  2  3
#>    5:   5    3  1  2
#>   ---               
#> 1996:   2  998  2  3
#> 1997:  30  999  1  2
#> 1998:  17  999  2  3
#> 1999:   4 1000  1  2
#> 2000:  20 1000  2  3

# use data.table, no problem, because the length has been unified first
out2 <- lapply(x, function(.) {
  data.table(., 1:2, 2:3)
})
out2 <- rbindlist(out2, idcol = 'TAG')
out2
#>        TAG    . V2 V3
#>    1:    1    1  1  2
#>    2:    1    1  2  3
#>    3:    2    2  1  2
#>    4:    2    2  2  3
#>    5:    3    3  1  2
#>   ---                
#> 1996:  998  998  2  3
#> 1997:  999  999  1  2
#> 1998:  999  999  2  3
#> 1999: 1000 1000  1  2
#> 2000: 1000 1000  2  3

# put the in-equal length last is no problem as well
out3 <- lapply(x, function(.) {
  list(1:2, 2:3, .)
})
out3 <- rbindlist(out3, idcol = 'TAG')
out3
#>        TAG V1 V2   V3
#>    1:    1  1  2    1
#>    2:    1  2  3    1
#>    3:    2  1  2    2
#>    4:    2  2  3    2
#>    5:    3  1  2    3
#>   ---                
#> 1996:  998  2  3  998
#> 1997:  999  1  2  999
#> 1998:  999  2  3  999
#> 1999: 1000  1  2 1000
#> 2000: 1000  2  3 1000

Created on 2019-08-21 by the reprex package (v0.2.1)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions