Skip to content

'nrows = 0L' in fread reads the whole file #4686

@hongyuanjia

Description

@hongyuanjia

According to ?fread:

nrows=0 returns the column names and typed empty columns determined by the large sample

But this is not true if an integer 0L is passed:

f <- tempfile(fileext = ".csv")
fwrite(mtcars, f)

fread(f, nrows = 0)
# Empty data.table (0 rows and 11 cols): mpg,cyl,disp,hp,drat,wt...
fread(f, nrows = 0L)
#       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#     <num> <int> <num> <int> <num> <num> <num> <int> <int> <int> <int>
#  1:  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4
#  2:  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4
#  3:  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
#  4:  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1
#  5:  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2
#  6:  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3     1
#  7:  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4
#  8:  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
#  9:  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2
# 10:  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4
# 11:  17.8     6 167.6   123  3.92 3.440 18.90     1     0     4     4
# 12:  16.4     8 275.8   180  3.07 4.070 17.40     0     0     3     3
# 13:  17.3     8 275.8   180  3.07 3.730 17.60     0     0     3     3
# 14:  15.2     8 275.8   180  3.07 3.780 18.00     0     0     3     3
# 15:  10.4     8 472.0   205  2.93 5.250 17.98     0     0     3     4
# 16:  10.4     8 460.0   215  3.00 5.424 17.82     0     0     3     4
# 17:  14.7     8 440.0   230  3.23 5.345 17.42     0     0     3     4
# 18:  32.4     4  78.7    66  4.08 2.200 19.47     1     1     4     1
# 19:  30.4     4  75.7    52  4.93 1.615 18.52     1     1     4     2
# 20:  33.9     4  71.1    65  4.22 1.835 19.90     1     1     4     1
# 21:  21.5     4 120.1    97  3.70 2.465 20.01     1     0     3     1
# 22:  15.5     8 318.0   150  2.76 3.520 16.87     0     0     3     2
# 23:  15.2     8 304.0   150  3.15 3.435 17.30     0     0     3     2
# 24:  13.3     8 350.0   245  3.73 3.840 15.41     0     0     3     4
# 25:  19.2     8 400.0   175  3.08 3.845 17.05     0     0     3     2
# 26:  27.3     4  79.0    66  4.08 1.935 18.90     1     1     4     1
# 27:  26.0     4 120.3    91  4.43 2.140 16.70     0     1     5     2
# 28:  30.4     4  95.1   113  3.77 1.513 16.90     1     1     5     2
# 29:  15.8     8 351.0   264  4.22 3.170 14.50     0     1     5     4
# 30:  19.7     6 145.0   175  3.62 2.770 15.50     0     1     5     6
# 31:  15.0     8 301.0   335  3.54 3.570 14.60     0     1     5     8
# 32:  21.4     4 121.0   109  4.11 2.780 18.60     1     1     4     2
#       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb

Output of sessionInfo()

# R version 3.6.2 (2019-12-12)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Arch Linux
# 
# Matrix products: default
# BLAS:   /usr/lib/libblas.so.3.9.0
# LAPACK: /usr/lib/liblapack.so.3.9.0
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] eplusr_0.12.0.9000 testthat_2.3.2     nvimcom_0.9-98    
# 
# loaded via a namespace (and not attached):
#  [1] progress_1.2.2          xfun_0.16              
#  [3] remotes_2.2.0           vctrs_0.3.2            
#  [5] generics_0.0.2          miniUI_0.1.1.1         
#  [7] usethis_1.5.1.9000      htmltools_0.5.0        
#  [9] blob_1.2.1              rlang_0.4.7            
# [11] pkgbuild_1.1.0          manipulateWidget_0.10.0
# [13] later_1.0.0             glue_1.4.1             
# [15] withr_2.2.0             DBI_1.1.0              
# [17] bit64_0.9-7             sessioninfo_1.1.1      
# [19] devtools_2.2.2          htmlwidgets_1.5.1      
# [21] memoise_1.1.0           knitr_1.29             
# [23] callr_3.4.3             fastmap_1.0.1          
# [25] httpuv_1.5.2            ps_1.3.4               
# [27] crosstalk_1.0.0         parallel_3.6.2         
# [29] fansi_0.4.1             Rcpp_1.0.5             
# [31] xtable_1.8-4            backports_1.1.8        
# [33] promises_1.1.0          checkmate_2.0.0        
# [35] desc_1.2.0              pkgload_1.1.0          
# [37] webshot_0.5.2           jsonlite_1.7.0         
# [39] mime_0.9                fs_1.4.2               
# [41] bit_1.1-15.2            hms_0.5.3              
# [43] digest_0.6.25           stringi_1.4.6          
# [45] processx_3.4.3          shiny_1.4.0            
# [47] rprojroot_1.3-2         cli_2.0.2              
# [49] tools_3.6.2             magrittr_1.5           
# [51] rgl_0.100.30            RSQLite_2.2.0          
# [53] crayon_1.3.4            pkgconfig_2.0.3        
# [55] ellipsis_0.3.1          data.table_1.12.4      
# [57] prettyunits_1.1.1       decido_0.2.0           
# [59] lubridate_1.7.8         assertthat_0.2.1       
# [61] rstudioapi_0.11         R6_2.4.1               
# [63] units_0.6-6             compiler_3.6.2         

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions