I have stumbled across what I believe to be a bug in as.data.table.xts(foo) where the index values, which should correspond to timestamps for the observations sometimes come out simply as the row numbers. I have looked on this repository and SO, but I found nothing on this topic.
From my experimentation this occurs when a column in foo has the name "x", it seems that the number of columns do not affect this bug, and the order of the columns do not change the result either.
I have written a little example that should sufficiently show the expected and the misbehavior and in which cases these occur.
Restarting R session...
> ## Pre-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_DK.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3
> library(xts)
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
> library(data.table)
data.table 1.13.7 IN DEVELOPMENT built 2021-02-11 11:21:19 UTC using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:xts’:
first, last
>
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_DK.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.7 xts_0.12.1 zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3 grid_4.0.3 lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
index AAPL MSFT
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
> as.data.table(b)
index x
<int> <int>
1: 1 1
2: 2 2
3: 3 3
4: 4 4
5: 5 5
6: 6 6
7: 7 7
8: 8 8
9: 9 9
10: 10 10
> as.data.table(c)
index x y
<int> <int> <int>
1: 1 1 101
2: 2 2 102
3: 3 3 103
4: 4 4 104
5: 5 5 105
6: 6 6 106
7: 7 7 107
8: 8 8 108
9: 9 9 109
10: 10 10 110
> as.data.table(d)
index y x
<int> <int> <int>
1: 1 1 101
2: 2 2 102
3: 3 3 103
4: 4 4 104
5: 5 5 105
6: 6 6 106
7: 7 7 107
8: 8 8 108
9: 9 9 109
10: 10 10 110
> as.data.table(e)
index y z
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
>
>
>
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
index V1
<POSc> <int>
1: 1970-01-01 00:00:01 1
2: 1970-01-01 00:00:02 2
3: 1970-01-01 00:00:03 3
4: 1970-01-01 00:00:04 4
5: 1970-01-01 00:00:05 5
6: 1970-01-01 00:00:06 6
7: 1970-01-01 00:00:07 7
8: 1970-01-01 00:00:08 8
9: 1970-01-01 00:00:09 9
10: 1970-01-01 00:00:10 10
> colnames(x) <- "x"
> as.data.table(x)
index x
<int> <int>
1: 1 1
2: 2 2
3: 3 3
4: 4 4
5: 5 5
6: 6 6
7: 7 7
8: 8 8
9: 9 9
10: 10 10
I have implemented a simple fix by just using set() instead of "[.data.table" to assign the index value to the output data table in as.data.table.xts(). The output after the fix is:
Restarting R session...
> library(data.table)
> ## Post-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_DK.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.7 xts_0.12.1 zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3 grid_4.0.3 lattice_0.20-41
> library(xts)
> library(data.table)
>
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_DK.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.7 xts_0.12.1 zoo_1.8-8
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3 grid_4.0.3 lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
index AAPL MSFT
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
> as.data.table(b)
index x
<POSc> <int>
1: 1970-01-01 00:01:41 1
2: 1970-01-01 00:01:42 2
3: 1970-01-01 00:01:43 3
4: 1970-01-01 00:01:44 4
5: 1970-01-01 00:01:45 5
6: 1970-01-01 00:01:46 6
7: 1970-01-01 00:01:47 7
8: 1970-01-01 00:01:48 8
9: 1970-01-01 00:01:49 9
10: 1970-01-01 00:01:50 10
> as.data.table(c)
index x y
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
> as.data.table(d)
index y x
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
> as.data.table(e)
index y z
<POSc> <int> <int>
1: 1970-01-01 00:01:41 1 101
2: 1970-01-01 00:01:42 2 102
3: 1970-01-01 00:01:43 3 103
4: 1970-01-01 00:01:44 4 104
5: 1970-01-01 00:01:45 5 105
6: 1970-01-01 00:01:46 6 106
7: 1970-01-01 00:01:47 7 107
8: 1970-01-01 00:01:48 8 108
9: 1970-01-01 00:01:49 9 109
10: 1970-01-01 00:01:50 10 110
>
>
>
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
index V1
<POSc> <int>
1: 1970-01-01 00:00:01 1
2: 1970-01-01 00:00:02 2
3: 1970-01-01 00:00:03 3
4: 1970-01-01 00:00:04 4
5: 1970-01-01 00:00:05 5
6: 1970-01-01 00:00:06 6
7: 1970-01-01 00:00:07 7
8: 1970-01-01 00:00:08 8
9: 1970-01-01 00:00:09 9
10: 1970-01-01 00:00:10 10
> colnames(x) <- "x"
> as.data.table(x)
index x
<POSc> <int>
1: 1970-01-01 00:00:01 1
2: 1970-01-01 00:00:02 2
3: 1970-01-01 00:00:03 3
4: 1970-01-01 00:00:04 4
5: 1970-01-01 00:00:05 5
6: 1970-01-01 00:00:06 6
7: 1970-01-01 00:00:07 7
8: 1970-01-01 00:00:08 8
9: 1970-01-01 00:00:09 9
10: 1970-01-01 00:00:10 10
As shown it now gives the correct output.
This is just a work-around. I can't quite figure out what actually causes this behavior, but I think it has something to do with line 1298 in data.table.R , but I am not 'into' the code enough to be sure.
If I change the jsub from zoo::index(x) to return(x) using the str2lang() function, I get an integer 1:10, I would expect to get the original xts object. I think this is somewhat related to point 2.13 in the FAQ, but I feel this is more of a bug. Otherwise, somewhere this behavior should be documented?
I hope I have provided enough information to be helpful. If the workaround is deemed acceptable, I can create a PR with the fix and a test or two.
I have stumbled across what I believe to be a bug in
as.data.table.xts(foo)where the index values, which should correspond to timestamps for the observations sometimes come out simply as the row numbers. I have looked on this repository and SO, but I found nothing on this topic.From my experimentation this occurs when a column in
foohas the name"x", it seems that the number of columns do not affect this bug, and the order of the columns do not change the result either.I have written a little example that should sufficiently show the expected and the misbehavior and in which cases these occur.
I have implemented a simple fix by just using
set()instead of"[.data.table"to assign the index value to the output data table inas.data.table.xts(). The output after the fix is:As shown it now gives the correct output.
This is just a work-around. I can't quite figure out what actually causes this behavior, but I think it has something to do with line 1298 in data.table.R , but I am not 'into' the code enough to be sure.
If I change the
jsubfromzoo::index(x)toreturn(x)using thestr2lang()function, I get an integer1:10, I would expect to get the originalxtsobject. I think this is somewhat related to point 2.13 in the FAQ, but I feel this is more of a bug. Otherwise, somewhere this behavior should be documented?I hope I have provided enough information to be helpful. If the workaround is deemed acceptable, I can create a PR with the fix and a test or two.