Skip to content

as.data.table.xts(foo) gives wrong index values when 'x' is in the column names. #4897

@emilsjoerup

Description

@emilsjoerup

I have stumbled across what I believe to be a bug in as.data.table.xts(foo) where the index values, which should correspond to timestamps for the observations sometimes come out simply as the row numbers. I have looked on this repository and SO, but I found nothing on this topic.

From my experimentation this occurs when a column in foo has the name "x", it seems that the number of columns do not affect this bug, and the order of the columns do not change the result either.

I have written a little example that should sufficiently show the expected and the misbehavior and in which cases these occur.

Restarting R session...

> ## Pre-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3   
> library(xts)
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

> library(data.table)
data.table 1.13.7 IN DEVELOPMENT built 2021-02-11 11:21:19 UTC using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com

Attaching package: ‘data.table’

The following objects are masked from ‘package:xts’:

    first, last

> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
                  index  AAPL  MSFT
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(b)
    index     x
    <int> <int>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10
> as.data.table(c)
    index     x     y
    <int> <int> <int>
 1:     1     1   101
 2:     2     2   102
 3:     3     3   103
 4:     4     4   104
 5:     5     5   105
 6:     6     6   106
 7:     7     7   107
 8:     8     8   108
 9:     9     9   109
10:    10    10   110
> as.data.table(d)
    index     y     x
    <int> <int> <int>
 1:     1     1   101
 2:     2     2   102
 3:     3     3   103
 4:     4     4   104
 5:     5     5   105
 6:     6     6   106
 7:     7     7   107
 8:     8     8   108
 9:     9     9   109
10:    10    10   110
> as.data.table(e)
                  index     y     z
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> 
> 
> 
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
                  index    V1
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10
> colnames(x) <- "x"
> as.data.table(x)
    index     x
    <int> <int>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

I have implemented a simple fix by just using set() instead of "[.data.table" to assign the index value to the output data table in as.data.table.xts(). The output after the fix is:

Restarting R session...

> library(data.table)
> ## Post-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> library(xts)
> library(data.table)
> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
                  index  AAPL  MSFT
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(b)
                  index     x
                 <POSc> <int>
 1: 1970-01-01 00:01:41     1
 2: 1970-01-01 00:01:42     2
 3: 1970-01-01 00:01:43     3
 4: 1970-01-01 00:01:44     4
 5: 1970-01-01 00:01:45     5
 6: 1970-01-01 00:01:46     6
 7: 1970-01-01 00:01:47     7
 8: 1970-01-01 00:01:48     8
 9: 1970-01-01 00:01:49     9
10: 1970-01-01 00:01:50    10
> as.data.table(c)
                  index     x     y
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(d)
                  index     y     x
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(e)
                  index     y     z
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> 
> 
> 
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
                  index    V1
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10
> colnames(x) <- "x"
> as.data.table(x)
                  index     x
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10

As shown it now gives the correct output.

This is just a work-around. I can't quite figure out what actually causes this behavior, but I think it has something to do with line 1298 in data.table.R , but I am not 'into' the code enough to be sure.
If I change the jsub from zoo::index(x) to return(x) using the str2lang() function, I get an integer 1:10, I would expect to get the original xts object. I think this is somewhat related to point 2.13 in the FAQ, but I feel this is more of a bug. Otherwise, somewhere this behavior should be documented?

I hope I have provided enough information to be helpful. If the workaround is deemed acceptable, I can create a PR with the fix and a test or two.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions