Skip to content

Race condition causes fread to read the data incorrectly occasionally #2260

@st-pasha

Description

@st-pasha

This example uses a 4209 rows x 378 cols dataset.

require(data.table);
f = fread("~/github/h2oai/tests/data/mercedesbenz.csv");
s385 = sum(f$X385);
s380 = sum(f$X380);
s200 = sum(f$X200);
for (b in 1:10000) {
   f = fread("~/github/h2oai/tests/data/mercedesbenz.csv");
   if (sum(f$X385) != s385) { stop("Checksum on column X385 failed"); }
   if (sum(f$X380) != s380) { stop("Checksum on column X380 failed"); }
   if (sum(f$X200) != s200) { stop("Checksum on column X200 failed"); }
 }

In my latest run it failed the first time at b=3498, here's an excerpt of the data in column X385 (should be only 0s and 1s):

[2737]           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
[2753]           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
[2769]           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
[2785]           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
[2801]           0           0           0           0           0           0 -1165986440       32747 -1131141416       32747 -1131141416       32747 -1172970760       32747 -1172189800       32747
[2817] -1131141416       32747 -1130042776       32747 -1172970760       32747 -1165986440       32747 -1172189800       32747 -1130042776       32747 -1165986440       32747 -1172189800       32747
[2833] -1165986440       32747 -1172970760       32747 -1130042776       32747 -1165986440       32747 -1130042776       32747 -1165986440       32747 -1130042776       32747 -1165986440       32747
[2849] -1172189800       32747 -1172970760       32747 -1130042776       32747 -1172189800       32747 -1165986440       32747 -1172970760       32747 -1131141416       32747 -1172970760       32747
[2865] -1172189800       32747 -1172970760       32747 -1165986440       32747 -1130042776       32747 -1165986440       32747 -1131141416       32747 -1165986440       32747 -1165986440       32747
[2881] -1172189800       32747 -1130042776       32747 -1165986440       32747 -1130042776       32747 -1172189800       32747 -1131141416       32747 -1172970760       32747 -1172970760       32747

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions