I noticed that data.table has been warning on CRAN for r-devel-linux-x86_64-debian-gcc. That image uses GCC 12 and since GCC 12, -Waddress now warns in more cases when comparing a non-constant pointer to NULL (gcc has always optimized this out), see here. This picked up on a missing dereference in fread (line 1287):
|
if (**nastr == '\0') { |
|
blank_is_a_NAstring = true; |
|
// if blank is the only one, as is the default, clear NAstrings so that doesn't have to be checked |
|
if (nastr==NAstrings && nastr+1==NULL) NAstrings=NULL; |
|
nastr++; |
|
continue; |
I've looked at it a bit and I think I get what Matt was trying to do there with that optimisation but I don't have a test case that would demonstrate any speed improvement from it. For small files, I can't see any difference. So unless someone has a good test case, I'd suggest just removing this line and the associated check in
end_NA_string.
(The comment in there is also currently incorrect, since the default is still 'NA' and not '\0'. Also note that atm, adding in the dereference can lead to a NULL pointer dereference a bit further down.)
I noticed that
data.tablehas been warning on CRAN for r-devel-linux-x86_64-debian-gcc. That image uses GCC 12 and since GCC 12,-Waddressnow warns in more cases when comparing a non-constant pointer to NULL (gcc has always optimized this out), see here. This picked up on a missing dereference infread(line 1287):data.table/src/fread.c
Lines 1284 to 1289 in e9a323d
I've looked at it a bit and I think I get what Matt was trying to do there with that optimisation but I don't have a test case that would demonstrate any speed improvement from it. For small files, I can't see any difference. So unless someone has a good test case, I'd suggest just removing this line and the associated check in
end_NA_string.(The comment in there is also currently incorrect, since the default is still
'NA'and not'\0'. Also note that atm, adding in the dereference can lead to a NULL pointer dereference a bit further down.)