Skip to content

Improve fread for very small or very large fp numbers.#4165

Merged
mattdowle merged 2 commits intoRdatatable:masterfrom
QuLogic:fread-alt-arch
Feb 16, 2020
Merged

Improve fread for very small or very large fp numbers.#4165
mattdowle merged 2 commits intoRdatatable:masterfrom
QuLogic:fread-alt-arch

Conversation

@QuLogic
Copy link
Copy Markdown
Contributor

@QuLogic QuLogic commented Jan 9, 2020

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a slightly differently parsed number. In base R, R_strtod handles small numbers by pre-dividing numerator and divisor before applying the exponent part (instead of dividing all together.) However, it does not use a lookup table.

For fread, trim the exponent lookup table from ±350 to ±300, and if anything is in that removed range, do two multiplications instead. This results in approximately the same effect as in base R.

Removing some of the range from the lookup table also fixes several warnings such as:

freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow]
   57 | 1.0E-324L,
      | ^~~~~~~~~
freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow]
  690 | 1.0E309L,
      | ^~~~~~~~

Closes #3492 now that #4213 closed the other issue in it.
The first part of #4032 is the same issue (test 1018) fixed here. A build on those architectures can be found here.
Closes #4097 too.

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a
slightly differently parsed number. In base R, `R_strtod` handles small
numbers by pre-dividing numerator and divisor before applying the
exponent part (instead of dividing all together.) However, it does not
use a lookup table.

For `fread`, trim the exponent lookup table from ±350 to ±300, and if
anything is in that removed range, do two multiplications instead. This
results in approximately the same effect as in base R.

Removing some of the range from the lookup table also fixes several
warnings such as:

```
freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow]
   57 | 1.0E-324L,
      | ^~~~~~~~~
freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow]
  690 | 1.0E309L,
      | ^~~~~~~~
```

See Rdatatable#3492 and Rdatatable#4032.
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 9, 2020

Codecov Report

Merging #4165 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4165      +/-   ##
==========================================
+ Coverage   99.54%   99.61%   +0.07%     
==========================================
  Files          72       72              
  Lines       13937    13901      -36     
==========================================
- Hits        13873    13847      -26     
+ Misses         64       54      -10
Impacted Files Coverage Δ
R/merge.R 100% <ø> (ø) ⬆️
R/as.data.table.R 100% <ø> (ø) ⬆️
src/rbindlist.c 100% <100%> (ø) ⬆️
src/dogroups.c 100% <100%> (+3.14%) ⬆️
R/between.R 100% <100%> (ø) ⬆️
src/frank.c 100% <100%> (ø) ⬆️
R/frank.R 100% <100%> (ø) ⬆️
src/subset.c 100% <100%> (ø) ⬆️
src/freadR.c 100% <100%> (ø) ⬆️
src/nafill.c 100% <100%> (ø) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1aa990...dc7659f. Read the comment docs.

@jangorecki jangorecki requested a review from mattdowle January 9, 2020 09:43
@jangorecki
Copy link
Copy Markdown
Member

no need to update manual? I see one example there related to precision

@QuLogic
Copy link
Copy Markdown
Contributor Author

QuLogic commented Jan 10, 2020

Which do you mean? Note this doesn't really change actual precision, just intermediary precision, so that final results are consistent.

@jangorecki
Copy link
Copy Markdown
Member

The section in examples I meant

# Numerical precision :

@QuLogic
Copy link
Copy Markdown
Contributor Author

QuLogic commented Jan 30, 2020

Ah, I don't believe that section requires any changes. The only effect here is if the fread results were compared with expected values written in R code directly (and only on specific architectures), but there are no comparisons.

@mattdowle mattdowle added this to the 1.12.9 milestone Feb 16, 2020
Comment thread src/fread.c
// and then remove extra from e.
// This avoids having to store very small or very large constants that may
// fail to be encoded by the compiler, even though the values can actually
// be stored correctly.
Copy link
Copy Markdown
Member

@mattdowle mattdowle Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear comment and wording. Perfect.
I invited you to be project member in your other PR #4213, and added you to contributor list there too.
Many thanks for investigating and fixing this!

@mattdowle mattdowle merged commit ff3e7d4 into Rdatatable:master Feb 16, 2020
@QuLogic QuLogic deleted the fread-alt-arch branch February 17, 2020 10:27
@jangorecki jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiler warning on platform: armv8l-unknown-linux-gnueabi (32-bit) Two test failures on armv7hl

3 participants