Skip to content

fread(nrows = 0) for a particular file enters infinite loop in dev #5868

@HughParsonage

Description

@HughParsonage

For a particular file (attached as temp.csv), running with nrows = 0 will cause an infinite loop (as the verbose output suggests). fread with any other number of rows appears to work.

# Minimal reproducible example; please be sure to set verbose=TRUE where possible!

fread("temp.csv", nrows = 0, verbose = TRUE)
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            12
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          12
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 6 threads with throttle==1024. See ?setDTthreads.
freadR.c has been passed a filename: ~/temp.csv
[01] Check arguments
  Using 6 threads (omp_get_max_threads()=12, nth=6)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file C:/Users/hughp/Documents/temp.csv
  File opened, size = 3.317MB (3477693 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<ADDRESS_ALIAS_PID,DATE_CREATED>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 100 lines of 7 fields using quote rule 0
  Detected 7 columns on line 1. This line is either column names or first data row. Line starts as: <<ADDRESS_ALIAS_PID,DATE_CREATED>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 7
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 100 because nrow limit (0) supplied
  Type codes (jump 000)    : 6B1DDD1  Quote rule 0
  Type codes (jump 006)    : 6B1DDDD  Quote rule 0
  Type codes (jump 100)    : 6B1DDDD  Quote rule 0
  'header' determined to be true due to column 1 containing a string on row 1 and a lower type (int32) in the rest of the 10050 sample rows
  =====
  Sampled 10050 rows (handled \n inside quoted fields) at 101 jump points
  Bytes from first data row on line 2 to the end of last row: 3477592
  Line length: mean=57.80 sd=1.23 min=55 max=74
  Estimated number of rows: 3477592 / 57.80 = 60170
  Initial alloc = 66187 rows (60170 + 10%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
  Alloc limited to lower nrows=0 passed in.
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : 6B1DDDD
[10] Allocate memory for the datatable
  Allocating 7 column slots (7 - 0 dropped) with 0 rows
[11] Read the data
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592
  Too few rows allocated. Allocating additional 72842 rows (now nrows=0) and continue reading from jump 0
  jumps=[0..3), chunk_size=1159197, total_size=3477592

# Output of sessionInfo()

> library(data.table)
data.table 1.14.99 IN DEVELOPMENT built 2023-12-18 13:30:51 UTC using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
33:15> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.99

loaded via a namespace (and not attached):
[1] compiler_4.2.1 tools_4.2.1

temp.csv

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions