In a pull request I was asked to open an issue on this subject, so here it is.
The reason for the proposed patch for fread was that C/C++ code compiled with a visual studio compiler will output e.g. 1.#INF instead of inf or #1.IND instead of nan when writing such floating point values to a text file using a printf like function. Since R compiles its packages using gcc, it will not recognize these strings as doubles, causing entire columns to be interpreted as text instread of numbers.
The current patch always checks for this, and therefore causes a slowdown of 2.2%. I've also tried making it optional by specifying an extra boolean argument to fread. This then sets a certain function pointer to e.g. strtod directly, or to the modified code in strtod_wrapper that performs the extra checks. However, even such a straightforward change still causes a slowdown of 1.5% when the extra checks are not used.
In a pull request I was asked to open an issue on this subject, so here it is.
The reason for the proposed patch for
freadwas that C/C++ code compiled with a visual studio compiler will output e.g.1.#INFinstead ofinfor#1.INDinstead ofnanwhen writing such floating point values to a text file using aprintflike function. Since R compiles its packages using gcc, it will not recognize these strings as doubles, causing entire columns to be interpreted as text instread of numbers.The current patch always checks for this, and therefore causes a slowdown of 2.2%. I've also tried making it optional by specifying an extra boolean argument to
fread. This then sets a certain function pointer to e.g.strtoddirectly, or to the modified code instrtod_wrapperthat performs the extra checks. However, even such a straightforward change still causes a slowdown of 1.5% when the extra checks are not used.