Gforce edge case creates segfault by ben-schwen · Pull Request #5109 · Rdatatable/data.table

ben-schwen · 2021-08-22T18:58:15Z

Closes #1994.

Considered functions:

codecov · 2021-08-22T19:09:01Z

Codecov Report

Merging #5109 (0725559) into master (897ac6d) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5109      +/-   ##
==========================================
- Coverage   99.54%   99.53%   -0.01%     
==========================================
  Files          76       76              
  Lines       14623    14468     -155     
==========================================
- Hits        14556    14401     -155     
  Misses         67       67

Impacted Files	Coverage Δ
R/test.data.table.R	`100.00% <100.00%> (ø)`
src/gsumm.c	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 897ac6d...0725559. Read the comment docs.

… and by=grp rather than by=list(vv=1:10)

mattdowle · 2021-08-23T07:13:12Z

        if (ansd[thisgrp]==NA_STRING) continue;
        const int ix = (irowslen == -1) ? i : irows[i]-1;
-        if (xd[ix]==NA_STRING || (strcmp(CHAR(xd[ix]), CHAR(ansd[thisgrp]))<0)==min)
+        if (ix+1==NA_INTEGER) SET_STRING_ELT(ans, thisgrp, NA_STRING);


INT_MAX+1 overflows to INT_MIN (==NA_INTEGER) so if there's a way to check if irows[i]==NA_INTEGER directly like the other cases that would be neater; e.g. in the extreme edge case of nrow==INT_MAX and irowslen==-1. Actually, maybe it's not as extreme if folk have code that batches into INT_MAX sizes.
But who am I to comment when this NA-in-i segfault was here all along.

I had the same thought yesterday, but then also thought that if overflowing from INT_MIN-1 to INT_MAX works in the first place, then the return overflow for the if should also work.

The problem vectors reaching length INT_MAX is certainly a future problem but one that we have to keep in mind, e.g. switching to 64bit integer for indexes? For now I'm not convinced folks have that kind of big vectors since this would mean an ridiculous amount of RAM.

print(object.size(seq.int(2^31-1)), unit="MB") 8192 Mb

base itself seems to have problems with vectors being that long e.g. x = seq.int(2^32); print(x) leading to

[ reached getOption("max.print") -- omitted -99999 entries ]

Yes overflow/underflow is reliable in practice but, iirc, the C standard only guarantees it for unsigned not signed. So the concern would be i) weird and wonderful architectures and compilers, and ii) to catch unintentional overflow. UBSAN would catch it in CRAN_Release.cmd before release, if a test covered the overflow. UBSAN is also in CRAN extra tests (https://cran.r-project.org/web/checks/check_issue_kinds.html) so compliance is required by CRAN. I've seen overflow caught by UBSAN before and fixed it, it was probably on smaller types like int8_t or char where a test covered it.

That problem is just the printing mechanism. Base R can create and use 'big' vectors but the capability varies by function. R's news file is a good place to search to see how they've increased support over time. But regardless, I had in mind a regular data.table with INT_MAX rows.

…erflow in NA-in-irows fix

ben-schwen added 2 commits August 22, 2021 18:32

init fix

150c898

fixed gforce edge case

772337a

fixed spacing

97b6b5f

ben-schwen added bug GForce issues relating to optimized grouping calculations (GForce) segfault labels Aug 22, 2021

loop for tests, simpler test than the issue by using 4 instead of 10,…

46afd33

… and by=grp rather than by=list(vv=1:10)

mattdowle added this to the 1.14.1 milestone Aug 23, 2021

news item tweak

a4259e4

mattdowle reviewed Aug 23, 2021

View reviewed changes

mattdowle added 8 commits August 23, 2021 11:55

gminmax: avoid relying on over/underflow and reduce logic

456e492

trailing whitespace

183456b

same for gmedian

b8c0873

same for glast, and folded gfirst into gfirstlast

be019fc

gnthvalue folded into gfirstlast

798a2b3

gvarsd1: reduced narm repeated logic, R API outside loops, avoided ov…

87878d0

…erflow in NA-in-irows fix

gprod: similar

da78a08

news item tweak

0725559

mattdowle merged commit b33dee6 into master Aug 24, 2021

mattdowle deleted the gforce_edge_case branch August 24, 2021 00:02

ben-schwen mentioned this pull request Sep 11, 2021

add na.rm argument to first/last and gfirst/glast #4730

Closed

shrektan mentioned this pull request Jan 15, 2022

Segfault on out-of-bound i and gforce #5312

Closed

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gforce edge case creates segfault#5109

Gforce edge case creates segfault#5109
mattdowle merged 13 commits intomasterfrom
gforce_edge_case

ben-schwen commented Aug 22, 2021

Uh oh!

codecov bot commented Aug 22, 2021 •

edited

Loading

Uh oh!

mattdowle Aug 23, 2021 •

edited

Loading

Uh oh!

ben-schwen Aug 23, 2021

Uh oh!

mattdowle Aug 23, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ben-schwen commented Aug 22, 2021

Uh oh!

codecov bot commented Aug 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mattdowle Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ben-schwen Aug 23, 2021

Choose a reason for hiding this comment

Uh oh!

mattdowle Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 22, 2021 •

edited

Loading

mattdowle Aug 23, 2021 •

edited

Loading

mattdowle Aug 23, 2021 •

edited

Loading