Skip to content

fread zipped files stored online #5316

@rafapereirabr

Description

@rafapereirabr

Hi all. You have done a fantastic work with fread and I know it can both read .csv files stored online and zipped .csv files stored locally. In the first case, fread automatically downloads the file, while in the second case it automatically unzips the file. My question is whether it would be possible to read a zipped .csv file that is stored online in a similar way, with a simple fread call .

reproducible example:

library(data.table)

online_file <- 'https://www.gov.br/anac/pt-br/assuntos/regulados/empresas-aereas/envio-de-informacoes/microdados/basica2000-02.zip'

# download to a local file
local_file <- tempfile('basica2010-02.zip')
utils::download.file(url = online_file, destfile = local_file)

# read zipped file stored locally
read_local <- fread( cmd = paste0('unzip -p ', local_file) )

# attempts to read zipped file stored online
read_online1 <- fread( online_file )
read_online1 <- fread( cmd = paste0('unzip -p ', online_file) )

sessionInfo:

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8         pillar_1.6.4       compiler_4.1.1     r5r_0.6.0         
 [5] prettyunits_1.1.1  remotes_2.4.2      class_7.3-19       easypackages_0.1.0
 [9] tools_4.1.1        testthat_3.1.1     pkgbuild_1.3.1     pkgload_1.2.4     
[13] tibble_3.1.6       memoise_2.0.1      lifecycle_1.0.1    pkgconfig_2.0.3   
[17] rlang_0.4.12       cli_3.1.0          DBI_1.1.2          curl_4.3.2        
[21] fastmap_1.1.0      rJava_1.0-6        e1071_1.7-9        dplyr_1.0.7       
[25] withr_2.4.3        generics_0.1.1     vctrs_0.3.8        desc_1.4.0        
[29] fs_1.5.2           devtools_2.4.3     sfheaders_0.4.0    tidyselect_1.1.1  
[33] grid_4.1.1         classInt_0.4-3     rprojroot_2.0.2    glue_1.6.0        
[37] sf_1.0-5           R6_2.5.1           processx_3.5.2     fansi_1.0.2       
[41] sessioninfo_1.2.2  callr_3.7.0        purrr_0.3.4        magrittr_2.0.1    
[45] units_0.7-2        ps_1.6.0           ellipsis_0.3.2     usethis_2.1.5     
[49] assertthat_0.2.1   utf8_1.2.2         KernSmooth_2.23-20 proxy_0.4-26      
[53] cachem_1.0.6       crayon_1.4.2  

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions