This is likely a non-issue (I do understand that these numbers are not meaningfully different). And Apologies if I missed mention of this in the documentation or prevoius issues (I did look).
There are certain values (I ran into one in the wild) where fread and read.table (which agrees with R's parser) parse a string representing a floating point number into equivalent but non-identical byte-representations.
Note this will mean that caching cannot be trusted to stay non-stale when upgrading read.table calls to fread, where the docs and a naive-understanding of what is happening would suggest they could.
Reproducible example:
library(data.table)
## data.table 1.12.8 using 12 threads (see ?getDTthreads). Latest news: r-datatable.com
exchar = "0.8060667366"
exnum = 0.8060667366
rtres = read.table(text = exchar)
rtres
## V1
## 1 0.8060667
rtval = rtres[1,1]
identical(rtval, exnum)
## [1] TRUE
frres = fread(text = c(exchar, exchar))
frres
## V1
## 1: 0.8060667
## 2: 0.8060667
frval = frres[1,V1]
identical(frval, exnum)
## [1] FALSE
sprintf("%1.17f", rtval)
## [1] "0.80606673659999994"
sprintf("%1.17f", frval)
## [1] "0.80606673660000006"
> sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## Matrix products: default
## BLAS: <snip>/R-3.6.1/lib64/R/lib/libRblas.so
## LAPACK: <snip>/R-3.6.1/lib64/R/lib/libRlapack.so
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
## other attached packages:
## [1] data.table_1.12.8
## loaded via a namespace (and not attached):
## [1] compiler_3.6.1 tools_3.6.1
This is likely a non-issue (I do understand that these numbers are not meaningfully different). And Apologies if I missed mention of this in the documentation or prevoius issues (I did look).
There are certain values (I ran into one in the wild) where fread and read.table (which agrees with R's parser) parse a string representing a floating point number into equivalent but non-identical byte-representations.
Note this will mean that caching cannot be trusted to stay non-stale when upgrading read.table calls to fread, where the docs and a naive-understanding of what is happening would suggest they could.
Reproducible example: