Conversation
Codecov Report
@@ Coverage Diff @@
## master #5109 +/- ##
==========================================
- Coverage 99.54% 99.53% -0.01%
==========================================
Files 76 76
Lines 14623 14468 -155
==========================================
- Hits 14556 14401 -155
Misses 67 67
Continue to review full report at Codecov.
|
… and by=grp rather than by=list(vv=1:10)
| if (ansd[thisgrp]==NA_STRING) continue; | ||
| const int ix = (irowslen == -1) ? i : irows[i]-1; | ||
| if (xd[ix]==NA_STRING || (strcmp(CHAR(xd[ix]), CHAR(ansd[thisgrp]))<0)==min) | ||
| if (ix+1==NA_INTEGER) SET_STRING_ELT(ans, thisgrp, NA_STRING); |
There was a problem hiding this comment.
INT_MAX+1 overflows to INT_MIN (==NA_INTEGER) so if there's a way to check if irows[i]==NA_INTEGER directly like the other cases that would be neater; e.g. in the extreme edge case of nrow==INT_MAX and irowslen==-1. Actually, maybe it's not as extreme if folk have code that batches into INT_MAX sizes.
But who am I to comment when this NA-in-i segfault was here all along.
There was a problem hiding this comment.
I had the same thought yesterday, but then also thought that if overflowing from INT_MIN-1 to INT_MAX works in the first place, then the return overflow for the if should also work.
The problem vectors reaching length INT_MAX is certainly a future problem but one that we have to keep in mind, e.g. switching to 64bit integer for indexes? For now I'm not convinced folks have that kind of big vectors since this would mean an ridiculous amount of RAM.
print(object.size(seq.int(2^31-1)), unit="MB")
8192 Mb
base itself seems to have problems with vectors being that long e.g. x = seq.int(2^32); print(x) leading to
[ reached getOption("max.print") -- omitted -99999 entries ]
There was a problem hiding this comment.
Yes overflow/underflow is reliable in practice but, iirc, the C standard only guarantees it for unsigned not signed. So the concern would be i) weird and wonderful architectures and compilers, and ii) to catch unintentional overflow. UBSAN would catch it in CRAN_Release.cmd before release, if a test covered the overflow. UBSAN is also in CRAN extra tests (https://cran.r-project.org/web/checks/check_issue_kinds.html) so compliance is required by CRAN. I've seen overflow caught by UBSAN before and fixed it, it was probably on smaller types like int8_t or char where a test covered it.
That problem is just the printing mechanism. Base R can create and use 'big' vectors but the capability varies by function. R's news file is a good place to search to see how they've increased support over time. But regardless, I had in mind a regular data.table with INT_MAX rows.
…erflow in NA-in-irows fix
Closes #1994.
Considered functions:
gsumgmeangvargsdgmediangprodgminmaxgfirst/gheadglast/gtail