Improve `fread` for very small or very large fp numbers. by QuLogic · Pull Request #4165 · Rdatatable/data.table

QuLogic · 2020-01-09T06:56:25Z

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a slightly differently parsed number. In base R, R_strtod handles small numbers by pre-dividing numerator and divisor before applying the exponent part (instead of dividing all together.) However, it does not use a lookup table.

For fread, trim the exponent lookup table from ±350 to ±300, and if anything is in that removed range, do two multiplications instead. This results in approximately the same effect as in base R.

Removing some of the range from the lookup table also fixes several warnings such as:

freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow]
   57 | 1.0E-324L,
      | ^~~~~~~~~
freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow]
  690 | 1.0E309L,
      | ^~~~~~~~

Closes #3492 now that #4213 closed the other issue in it.
The first part of #4032 is the same issue (test 1018) fixed here. A build on those architectures can be found here.
Closes #4097 too.

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a slightly differently parsed number. In base R, `R_strtod` handles small numbers by pre-dividing numerator and divisor before applying the exponent part (instead of dividing all together.) However, it does not use a lookup table. For `fread`, trim the exponent lookup table from ±350 to ±300, and if anything is in that removed range, do two multiplications instead. This results in approximately the same effect as in base R. Removing some of the range from the lookup table also fixes several warnings such as: ``` freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow] 57 | 1.0E-324L, | ^~~~~~~~~ freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow] 690 | 1.0E309L, | ^~~~~~~~ ``` See Rdatatable#3492 and Rdatatable#4032.

codecov · 2020-01-09T07:05:31Z

Codecov Report

Merging #4165 into master will increase coverage by 0.07%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4165      +/-   ##
==========================================
+ Coverage   99.54%   99.61%   +0.07%     
==========================================
  Files          72       72              
  Lines       13937    13901      -36     
==========================================
- Hits        13873    13847      -26     
+ Misses         64       54      -10

Impacted Files	Coverage Δ
R/merge.R	`100% <ø> (ø)`	⬆️
R/as.data.table.R	`100% <ø> (ø)`	⬆️
src/rbindlist.c	`100% <100%> (ø)`	⬆️
src/dogroups.c	`100% <100%> (+3.14%)`	⬆️
R/between.R	`100% <100%> (ø)`	⬆️
src/frank.c	`100% <100%> (ø)`	⬆️
R/frank.R	`100% <100%> (ø)`	⬆️
src/subset.c	`100% <100%> (ø)`	⬆️
src/freadR.c	`100% <100%> (ø)`	⬆️
src/nafill.c	`100% <100%> (ø)`	⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1aa990...dc7659f. Read the comment docs.

jangorecki · 2020-01-09T09:48:18Z

no need to update manual? I see one example there related to precision

QuLogic · 2020-01-10T00:13:40Z

Which do you mean? Note this doesn't really change actual precision, just intermediary precision, so that final results are consistent.

jangorecki · 2020-01-10T03:38:42Z

The section in examples I meant

# Numerical precision :

QuLogic · 2020-01-30T06:01:41Z

Ah, I don't believe that section requires any changes. The only effect here is if the fread results were compared with expected values written in R code directly (and only on specific architectures), but there are no comparisons.

mattdowle · 2020-02-16T20:07:32Z

+    // and then remove extra from e.
+    // This avoids having to store very small or very large constants that may
+    // fail to be encoded by the compiler, even though the values can actually
+    // be stored correctly.


Very clear comment and wording. Perfect.
I invited you to be project member in your other PR #4213, and added you to contributor list there too.
Many thanks for investigating and fixing this!

jangorecki requested a review from mattdowle January 9, 2020 09:43

QuLogic mentioned this pull request Jan 30, 2020

Compiler warning on platform: armv8l-unknown-linux-gnueabi (32-bit) #4097

Closed

mattdowle added this to the 1.12.9 milestone Feb 16, 2020

Merge branch 'master' into fread-alt-arch

dc7659f

mattdowle approved these changes Feb 16, 2020

View reviewed changes

mattdowle merged commit ff3e7d4 into Rdatatable:master Feb 16, 2020

QuLogic deleted the fread-alt-arch branch February 17, 2020 10:27

MichaelChirico mentioned this pull request May 20, 2020

fread occasionally reads in differently rounded non-exact fp numbers than base R #4461

Closed

jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `fread` for very small or very large fp numbers.#4165

Improve `fread` for very small or very large fp numbers.#4165
mattdowle merged 2 commits intoRdatatable:masterfrom
QuLogic:fread-alt-arch

QuLogic commented Jan 9, 2020 •

edited by mattdowle

Loading

Uh oh!

codecov bot commented Jan 9, 2020 •

edited

Loading

Uh oh!

jangorecki commented Jan 9, 2020

Uh oh!

QuLogic commented Jan 10, 2020

Uh oh!

jangorecki commented Jan 10, 2020

Uh oh!

QuLogic commented Jan 30, 2020

Uh oh!

mattdowle Feb 16, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

QuLogic commented Jan 9, 2020 • edited by mattdowle Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jangorecki commented Jan 9, 2020

Uh oh!

QuLogic commented Jan 10, 2020

Uh oh!

jangorecki commented Jan 10, 2020

Uh oh!

QuLogic commented Jan 30, 2020

Uh oh!

mattdowle Feb 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QuLogic commented Jan 9, 2020 •

edited by mattdowle

Loading

codecov bot commented Jan 9, 2020 •

edited

Loading

mattdowle Feb 16, 2020 •

edited

Loading